You might not be aware, but the majority of programs you use on your computer are classified as Portable Executables (PE). According to Microsoft, this term refers to file types that aren’t tied to a specific architecture. Having an appreciation of the Portable Executable format can offer valuable benefits, especially if you are involved in any of the following:
- Malware Analysis & Reverse Engineering
When it comes to analyzing malware and reverse engineering software, a solid grasp of the PE file structure is indispensable. It equips you with the ability to extract crucial information, uncover suspicious patterns, identify indicators of compromise (IoCs), and detect potential obfuscation during static analysis. - Resource Extraction
Many programs contain valuable resources like icons, images, and localization data. Knowledge of the PE structure empowers you to easily extract these resources. Tools such as Resource Hacker can assist in this process, making it accessible even to those without extensive technical expertise. - Developers’ Toolkit
If you’re a developer, understanding the PE file structure can be a game-changer. It allows you to troubleshoot problematic programs, pinpoint the causes of crashes, and identify performance bottlenecks. For instance, you can read a fascinating blog post about how someone significantly improved GTA Online load times, showcasing the real-world impact of such knowledge.
Notice Title
Brief Description of Portable Executable Internals
This section may seem unnecessary if you are well-versed in programming, but for the vast majority of computer users, the inner workings of a program remain a mystery. Even if you’ve been programming in a language like Python, you might still lack an appreciation for file internals.
00000000 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00
00000010 B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 18 01 00 00
00000040 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68
00000050 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F
00000060 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20
00000070 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00
00000080 B9 5C 2E 71 FD 3D 40 22 FD 3D 40 22 FD 3D 40 22
00000090 49 A1 B1 22 EE 3D 40 22 49 A1 B3 22 76 3D 40 22
000000A0 49 A1 B2 22 EA 3D 40 22 C6 63 43 23 F5 3D 40 22
000000B0 C6 63 45 23 DB 3D 40 22 C6 63 44 23 E9 3D 40 22
000000C0 F4 45 C3 22 FB 3D 40 22 F4 45 C7 22 FC 3D 40 22
000000D0 F4 45 D3 22 E0 3D 40 22 FD 3D 41 22 25 3C 40 22
000000E0 6F 63 43 23 FF 3D 40 22 6A 63 49 23 D6 3D 40 22
000000F0 6F 63 BF 22 FC 3D 40 22 6A 63 42 23 FC 3D 40 22
00000100 52 69 63 68 FD 3D 40 22 00 00 00 00 00 00 00 00
00000110 00 00 00 00 00 00 00 00 50 45 00 00 64 86 06 00
00000120 28 83 11 5F 00 00 00 00 00 00 00 00 F0 00 23 00
00000130 0B 02 0E 00 00 46 0C 00 00 A6 04 00 00 00 00 00
00000140 04 7E 0A 00 00 10 00 00 00 00 00 40 01 00 00 00
00000150 00 10 00 00 00 02 00 00 05 00 02 00 00 00 00 00
00000160 05 00 02 00 00 00 00 00 00 A0 11 00 00 04 00 00
00000170 00 00 00 00 02 00 20 00 00 00 40 00 00 00 00 00
00000180 00 10 00 00 00 00 00 00 00 00 10 00 00 00 00 00
00000190 00 10 00 00 00 00 00 00 00 00 00 00 10 00 00 00
000001A0 00 00 00 00 00 00 00 00 C4 C2 0F 00 2C 01 00 00
000001B0 00 30 11 00 3C 62 00 00 00 A0 10 00 AC 71 00 00
At first glance, it resembles a complex migraine-inducing pattern. However, this seemingly chaotic assemblage of bytes within this .exe
file serves a very specific purpose. Each byte’s placement plays a critical role in how a file is loaded and executed on your system.
When reading the rest of the blog, I recommend referring to the below image as it will give you a good idea of where you are located in the portable executable format:
DOS Header
The DOS Header is a 64-byte segment located at the beginning of a file, typically underutilized in modern Windows executables. Nonetheless, it contains 6-bytes of valuable information.
e_magic [2-bytes/WORD]
: Positioned right at the file’s start, these two bytes will have the hexadecimal 0x5A4D
, which translates to ‘MZ’ in ASCII. This serves as a signature denoting an MS-DOS executable file. You might be wondering why it’s labeled ‘MZ’. Well, it stands for ‘Mark Zbikowski’, one of the MS-DOS’ developers. e_magic
exists more for legacy purposes than anything else.
e_lfanew [4-bytes/DWORD]
: These four bytes, situated at the 60th byte (offset 0x3C
), are extremely valuable and act as a pointer to the PE Header. The PE Header is the modern successor to the DOS Header.
DOS Stub
The Portable Executable Format includes a somewhat redundant section in the PE file, housing the text string “This program cannot be run in DOS mode.” In modern Windows systems, this portion serves no practical function since program execution is primarily managed through the PE Header.
PE Header
This is where the file format gets interesting. It contains the following:Signature [4-bytes/DWORD]
: The signature is denoted by “PE” followed by two NULL bytes, with “PE” signifying “Portable Executable”.
Machine [2-bytes/WORD]
: This specifies the target architecture for the executable for example: x86, x64, ARM etc.
NumberOfSections [2-bytes/WORD]
: This value informs the loader about the number of sections to be loaded. Each section serves a distinct purpose:
Section | Description |
.text | Contains executable code. |
.data | Stores data, including global and static variables. |
.rsrc | Reserved for program resources such as icons and images. |
.bss | Represents the Block Starting Symbol. Typically used for statically allocated variables that are not yet initialized. |
.idata | Reserved for the Import section, providing information about functions imported into the program, including DLLs. |
.edata | The Export section, containing functions that can be utilized by other programs or DLLs. |
.rdata | Used for read-only data, including constants etc. |
TimeDateStamp [4-bytes/DWORD]
: This value signifies the program’s creation or compilation time in the form of an epoch timestamp (measuring seconds since 01/01/97 00:00:00). An intriguing aspect to note is the potential risk of an integer overflow in 2038 because of the limited space available in this DWORD
field.
PointerToSymbolTable [4-bytes/DWORD]
: This value represents the offset to the Symbol Table, which contains information about functions, variables, and various other components. In most cases, this isn’t included in production builds due to space and security considerations.
NumberOfSymbols [4-bytes/DWORD]
: This field stores the count of entries in the Symbol Table.
SizeOfOptionalHeader [4-bytes/DWORD]
: The name itself gives a good hint about its purpose, and we’ll delve into the Optional Header in more detail shortly.
Characteristics [2-bytes/WORD]
: This 2-byte field is fascinating, as each of its 16 bits indicates a true (1) or false (0) value, reflecting specific attributes. Below is a table detailing these characteristics:
Name | Description |
Relocation Stripped | If set to 1, this flag indicates that the file lacks base relocations, requiring it to be loaded at its preferred base address in the portable executable format. |
Executable | When marked as 1, this flag designates the file as an executable. A lack of this flag indicates a linker error in the context of the portable executable format. |
Line Numbers Stripped | A value of 1 signifies the removal of debugging information related to line numbers from the file. This is deprecated and will usually be marked as 0. |
Symbols Stripped | If set to 1, this flag indicates that symbols have been removed from the file in the portable executable format. Although deprecated, it’s usually marked as 0. |
Aggressive Time Working Set | Marked as 1, this flag indicates aggressive trimming of the working set (physical memory usage) by the process. However, it is obsolete for Windows 2000 and later and is typically marked as 0. |
Large Address Aware | A value of 1 signifies that the file can handle addresses larger than 2 GB, a common trait for 64-bit applications in the portable executable format. |
Reserved Bytes | Reserved for future use within the portable executable format. |
Little Endian (bytes reserved) | A value of 1 indicates that the least significant byte comes first (Little Endian) |
Is 32-bit Machine | If set to 1, this flag indicates that the file is intended for a 32-bit machine within the portable executable format. |
Debug Information Stripped | If marked as 1, then debugging information is removed from the file. |
Removable Run from Swap | Marked as 1, this flag signifies that the file is meant to run from removable media (e.g., a USB stick) and should utilize the swap file on the hard drive. |
Net Run from Swap | Similar to the previous flag, this is used when the file is intended to run from a network location and will be copied to the swap file if marked as 1 in the portable executable format. |
System | If marked as 1, then the file is a system file and not a user program. |
DLL | A value of 1 denotes that the file is a Dynamic-link Library (DLL) within the portable executable format. |
Uniprocessor System Only | If marked as 1 it indicates that the file is to exclusively be run on a uniprocessor system. |
Big Endian (bytes reserved) | Similar to the Little Endian bit, a value of 1 indicates that the most significant byte comes first (Big Endian) in the portable executable format. |
In this example, the binary flags indicate that this file is an executable and is large address aware.
Optional Header
This header gives important information to the loader, but it can get quite complex in terms of its details. I’ll aim to strike a balance in the level of detail I provide so you don’t get overwhelmed. You may have already noticed that the Optional Header doesn’t have a set size. Fortunately, the PE Header above it contains a value called SizeOfOptionalHeader
that tells you its size. If you ever want to create your own file parser, you can use this value to help you.
Name | Description | Size |
Magic Number | This indicates whether the application is a 32-bit application (PE / 0x10B ), a 64-bit application (PE+ / 0x20B ) and a ROM (0x107 ). | 2-bytes / WORD |
Major Linker Version | It tells you the major version number of the linker used to create the file. | 1-byte |
Minor Linker Version | This indicates the minor version number of the linker used for the file’s creation. | 1-byte |
Size of Code | Shows the size, in bytes, of the .text section or the total of all code sections. | 4-bytes / DWORD |
Size of Initialized Data | This specifies the size of the initialized data section or the sum of all such sections if multiple exist. | 4-bytes / DWORD |
Size of Uninitialized Data | Indicates the size of the uninitialized data section (.bss ) or the total of all relevant sections (when there are more than one .bss sections). | 4-bytes / DWORD |
Address of Entry Point | This is the address where the program begins executing. Please note that when analyzing the file dynamically, the entry point might differ. | 4-bytes / DWORD |
Base of Code | Base address of the code section when loaded into memory. | 4-bytes / DWORD |
Base of Data (PE32 only) | This represents the address relative to the image base for data, but this applies only to PE32. | 4-bytes / DWORD |
Image Base | Preferred base address for loading the file into memory. | 4-bytes / DWORD (PE32) 8-bytes / QWORD (PE32+) |
Section Alignment | Alignment of sections in bytes when loaded into memory. | 4-bytes / DWORD |
File Alignment | Alignment of raw data (defaulted to 512). | 4-bytes / DWORD |
Major OS Version | The major number of the OS. | 2-bytes / WORD |
Minor OS Version | The minor number of the OS. | 2-bytes / WORD |
Major Image Version | Major number of the Image. | 2-bytes / WORD |
Minor Image Version | Minor number of the Image. | 2-bytes / WORD |
Major Subsystem Version | Major subsystem version. | 2-bytes / WORD |
Minor Subsystem Version | Minor subsystem version. | 2-bytes / WORD |
Win 32 Version | This is a reserved set of 4-bytes and will be set to 0 by default. | 4-bytes / DWORD |
Size of Image | Size of the image in bytes when loaded into memory. | 4-bytes / DWORD |
Size of Headers | Size of the MS-DOS Header, Optional Header etc. | 4-bytes / DWORD |
Checksum | Used to check the integrity of the file. If marked as 0 then now checksum validation is performed. | 4-bytes / DWORD |
Subsystem | Indicates on what type of program it is (e.g. console or GUI). | 2-byte / WORD |
DLL Characteristics | This is a bitfield of characteristics. | 2-bytes / WORD |
Size of Stack Reserve | Size of the reserved stack. | 4-bytes / DWORD |
Size of Stack Commit | Size of the committed stack. | 4-bytes / DWORD |
Size of Heap Reserve | Size of the reserved heap. | 4-bytes / DWORD |
Size of Heap Commit | Size of the committed heap. | 4-bytes / DWORD |
Loader Flags | Reserved set of 4-bytes, will be set to 0. | 4-bytes / DWORD |
Number of RVA and Sizes | Number of data-directory entries. | 4-bytes / DWORD |
Data Directories | A table of data directory information (contains the relative virtual address to the image base and size for each). | Structure containing 8-bytes per directory. |
Section Table
A good way to think of this section is to take notice of the word “table” as it is essentially a table of different types of sections (we mentioned these sections in the PE Header part of this blog post).
Each entry within this table will hold valuable information on each section. Here is an example of one of the sections:
Name | Description | Size |
Name | Name of the section e.g. .text . | 8-bytes / QWORD |
Virtual Size | Size of the section once it is loaded into memory (runtime). | 4-bytes / DWORD |
Virtual Address | This dictates where the section should be loaded into memory when the executable has been opened. | 4-bytes / DWORD |
Size of Raw Data | This is the size of the section’s data on disk (not being run). | 4-bytes / DWORD |
Pointer to Raw Data | This is the address of where you can find the raw data mentioned above. | 4-bytes / DWORD |
Pointer to Relocations | If the section has relocations, it will be the address of the relocation table; if not then it will be marked with 0. | 4-bytes / DWORD |
Pointer to Line Numbers | Similar to the above, it points to the line table (this is usually used for debugging and typically won’t seen for release versions of software). | 4-bytes / DWORD |
Number of Relocations | This contains the number of entries in the relocation table. | 2-bytes / WORD |
Number of Line Numbers | The number of entries in the line table for the portable executable format. | 2-bytes / WORD |
Characteristics | This is a 4-byte section with each bit representing a certain characteristic (not including alignments). | 4-bytes / DWORD |
Portable Executable Format Conclusion
This topic can indeed become quite complex and fascinating. However, my goal in this blog post is to present it in a way that makes it approachable for individuals who may not have prior experience with reverse engineering or related fields. I hope I’ve successfully achieved this aim.
If you are interested in learning more about this subject, here are some good books and websites to learn from:
- Reversing: Secrets of Reverse Engineering
While it’s an older book, it provides an excellent foundation for reverse engineering. It’s a helpful starting point. - Windows Internals, Part 1: System architecture, processes, threads, memory management, and more (7th Edition)
This book is a must-read for anyone curious about Windows internals. It offers in-depth insights into the inner workings of the Windows operating system. I’ve read it a lot during quiet times at work, and it’s a great read if you’re interested in this kind of thing. - Guided Hacking
This website is a fantastic resource for those looking to explore reverse engineering, malware analysis, or game hacking. It’s an excellent starting point for gaining knowledge and skills in these areas.
Leave a Reply