Portable Executable Format: Made Easy

Portable Executable Format Banner

Decode the secrets of programs via the portable executable format.

You might not be aware, but the majority of programs you use on your computer are classified as Portable Executables (PE). According to Microsoft, this term refers to file types that aren’t tied to a specific architecture. Having an appreciation of the Portable Executable format can offer valuable benefits, especially if you are involved in any of the following:

  • Malware Analysis & Reverse Engineering
    When it comes to analyzing malware and reverse engineering software, a solid grasp of the PE file structure is indispensable. It equips you with the ability to extract crucial information, uncover suspicious patterns, identify indicators of compromise (IoCs), and detect potential obfuscation during static analysis.
  • Resource Extraction
    Many programs contain valuable resources like icons, images, and localization data. Knowledge of the PE structure empowers you to easily extract these resources. Tools such as Resource Hacker can assist in this process, making it accessible even to those without extensive technical expertise.
  • Developers’ Toolkit
    If you’re a developer, understanding the PE file structure can be a game-changer. It allows you to troubleshoot problematic programs, pinpoint the causes of crashes, and identify performance bottlenecks. For instance, you can read a fascinating blog post about how someone significantly improved GTA Online load times, showcasing the real-world impact of such knowledge.

Brief Description of Portable Executable Internals

This section may seem unnecessary if you are well-versed in programming, but for the vast majority of computer users, the inner workings of a program remain a mystery. Even if you’ve been programming in a language like Python, you might still lack an appreciation for file internals.

00000000 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00
00000010 B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 18 01 00 00
00000040 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68
00000050 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F
00000060 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20
00000070 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00
00000080 B9 5C 2E 71 FD 3D 40 22 FD 3D 40 22 FD 3D 40 22
00000090 49 A1 B1 22 EE 3D 40 22 49 A1 B3 22 76 3D 40 22
000000A0 49 A1 B2 22 EA 3D 40 22 C6 63 43 23 F5 3D 40 22
000000B0 C6 63 45 23 DB 3D 40 22 C6 63 44 23 E9 3D 40 22
000000C0 F4 45 C3 22 FB 3D 40 22 F4 45 C7 22 FC 3D 40 22
000000D0 F4 45 D3 22 E0 3D 40 22 FD 3D 41 22 25 3C 40 22
000000E0 6F 63 43 23 FF 3D 40 22 6A 63 49 23 D6 3D 40 22
000000F0 6F 63 BF 22 FC 3D 40 22 6A 63 42 23 FC 3D 40 22
00000100 52 69 63 68 FD 3D 40 22 00 00 00 00 00 00 00 00
00000110 00 00 00 00 00 00 00 00 50 45 00 00 64 86 06 00
00000120 28 83 11 5F 00 00 00 00 00 00 00 00 F0 00 23 00
00000130 0B 02 0E 00 00 46 0C 00 00 A6 04 00 00 00 00 00
00000140 04 7E 0A 00 00 10 00 00 00 00 00 40 01 00 00 00
00000150 00 10 00 00 00 02 00 00 05 00 02 00 00 00 00 00
00000160 05 00 02 00 00 00 00 00 00 A0 11 00 00 04 00 00
00000170 00 00 00 00 02 00 20 00 00 00 40 00 00 00 00 00
00000180 00 10 00 00 00 00 00 00 00 00 10 00 00 00 00 00
00000190 00 10 00 00 00 00 00 00 00 00 00 00 10 00 00 00
000001A0 00 00 00 00 00 00 00 00 C4 C2 0F 00 2C 01 00 00
000001B0 00 30 11 00 3C 62 00 00 00 A0 10 00 AC 71 00 00

At first glance, it resembles a complex migraine-inducing pattern. However, this seemingly chaotic assemblage of bytes within this .exe file serves a very specific purpose. Each byte’s placement plays a critical role in how a file is loaded and executed on your system.

When reading the rest of the blog, I recommend referring to the below image as it will give you a good idea of where you are located in the portable executable format:

Portable Executable Format Structure

DOS Header

The DOS Header is a 64-byte segment located at the beginning of a file, typically underutilized in modern Windows executables. Nonetheless, it contains 6-bytes of valuable information.

e_magic [2-bytes/WORD]: Positioned right at the file’s start, these two bytes will have the hexadecimal 0x5A4D, which translates to ‘MZ’ in ASCII. This serves as a signature denoting an MS-DOS executable file. You might be wondering why it’s labeled ‘MZ’. Well, it stands for ‘Mark Zbikowski’, one of the MS-DOS’ developers. e_magic exists more for legacy purposes than anything else.

e_lfanew [4-bytes/DWORD]: These four bytes, situated at the 60th byte (offset 0x3C), are extremely valuable and act as a pointer to the PE Header. The PE Header is the modern successor to the DOS Header.

DOS Stub

The Portable Executable Format includes a somewhat redundant section in the PE file, housing the text string “This program cannot be run in DOS mode.” In modern Windows systems, this portion serves no practical function since program execution is primarily managed through the PE Header.

PE Header

This is where the file format gets interesting. It contains the following:
Signature [4-bytes/DWORD]: The signature is denoted by “PE” followed by two NULL bytes, with “PE” signifying “Portable Executable”.

Machine [2-bytes/WORD]: This specifies the target architecture for the executable for example: x86, x64, ARM etc.

NumberOfSections [2-bytes/WORD]: This value informs the loader about the number of sections to be loaded. Each section serves a distinct purpose:

.textContains executable code.
.dataStores data, including global and static variables.
.rsrcReserved for program resources such as icons and images.
.bssRepresents the Block Starting Symbol. Typically used for statically allocated variables that are not yet initialized.
.idataReserved for the Import section, providing information about functions imported into the program, including DLLs.
.edataThe Export section, containing functions that can be utilized by other programs or DLLs.
.rdataUsed for read-only data, including constants etc.
Table of Sections (there are many other sections, as well but these are the ones you are most likely to come across).

TimeDateStamp [4-bytes/DWORD]: This value signifies the program’s creation or compilation time in the form of an epoch timestamp (measuring seconds since 01/01/97 00:00:00). An intriguing aspect to note is the potential risk of an integer overflow in 2038 because of the limited space available in this DWORD field.

PointerToSymbolTable [4-bytes/DWORD]: This value represents the offset to the Symbol Table, which contains information about functions, variables, and various other components. In most cases, this isn’t included in production builds due to space and security considerations.

NumberOfSymbols [4-bytes/DWORD]: This field stores the count of entries in the Symbol Table.

SizeOfOptionalHeader [4-bytes/DWORD]: The name itself gives a good hint about its purpose, and we’ll delve into the Optional Header in more detail shortly.

Characteristics [2-bytes/WORD]: This 2-byte field is fascinating, as each of its 16 bits indicates a true (1) or false (0) value, reflecting specific attributes. Below is a table detailing these characteristics:

Relocation StrippedIf set to 1, this flag indicates that the file lacks base relocations, requiring it to be loaded at its preferred base address in the portable executable format.
ExecutableWhen marked as 1, this flag designates the file as an executable. A lack of this flag indicates a linker error in the context of the portable executable format.
Line Numbers StrippedA value of 1 signifies the removal of debugging information related to line numbers from the file. This is deprecated and will usually be marked as 0.
Symbols StrippedIf set to 1, this flag indicates that symbols have been removed from the file in the portable executable format. Although deprecated, it’s usually marked as 0.
Aggressive Time Working SetMarked as 1, this flag indicates aggressive trimming of the working set (physical memory usage) by the process. However, it is obsolete for Windows 2000 and later and is typically marked as 0.
Large Address AwareA value of 1 signifies that the file can handle addresses larger than 2 GB, a common trait for 64-bit applications in the portable executable format.
Reserved BytesReserved for future use within the portable executable format.
Little Endian (bytes reserved)A value of 1 indicates that the least significant byte comes first (Little Endian)
Is 32-bit MachineIf set to 1, this flag indicates that the file is intended for a 32-bit machine within the portable executable format.
Debug Information StrippedIf marked as 1, then debugging information is removed from the file.
Removable Run from SwapMarked as 1, this flag signifies that the file is meant to run from removable media (e.g., a USB stick) and should utilize the swap file on the hard drive.
Net Run from SwapSimilar to the previous flag, this is used when the file is intended to run from a network location and will be copied to the swap file if marked as 1 in the portable executable format.
SystemIf marked as 1, then the file is a system file and not a user program.
DLLA value of 1 denotes that the file is a Dynamic-link Library (DLL) within the portable executable format.
Uniprocessor System OnlyIf marked as 1 it indicates that the file is to exclusively be run on a uniprocessor system.
Big Endian (bytes reserved)Similar to the Little Endian bit, a value of 1 indicates that the most significant byte comes first (Big Endian) in the portable executable format.
Table of Characteristics in the PE Header.
Portable Executable Format Characteristics

In this example, the binary flags indicate that this file is an executable and is large address aware.

Optional Header

This header gives important information to the loader, but it can get quite complex in terms of its details. I’ll aim to strike a balance in the level of detail I provide so you don’t get overwhelmed. You may have already noticed that the Optional Header doesn’t have a set size. Fortunately, the PE Header above it contains a value called SizeOfOptionalHeader that tells you its size. If you ever want to create your own file parser, you can use this value to help you.

Magic NumberThis indicates whether the application is a 32-bit application (PE / 0x10B), a 64-bit application (PE+ / 0x20B) and a ROM (0x107).2-bytes / WORD
Major Linker VersionIt tells you the major version number of the linker used to create the file.1-byte
Minor Linker VersionThis indicates the minor version number of the linker used for the file’s creation.1-byte
Size of CodeShows the size, in bytes, of the .text section or the total of all code sections.4-bytes / DWORD
Size of Initialized DataThis specifies the size of the initialized data section or the sum of all such sections if multiple exist.4-bytes / DWORD
Size of Uninitialized DataIndicates the size of the uninitialized data section (.bss) or the total of all relevant sections (when there are more than one .bss sections).4-bytes / DWORD
Address of Entry PointThis is the address where the program begins executing. Please note that when analyzing the file dynamically, the entry point might differ.4-bytes / DWORD
Base of CodeBase address of the code section when loaded into memory.4-bytes / DWORD
Base of Data (PE32 only)This represents the address relative to the image base for data, but this applies only to PE32.4-bytes / DWORD
Image BasePreferred base address for loading the file into memory.4-bytes / DWORD (PE32)
8-bytes / QWORD (PE32+)
Section AlignmentAlignment of sections in bytes when loaded into memory.4-bytes / DWORD
File AlignmentAlignment of raw data (defaulted to 512).4-bytes / DWORD
Major OS VersionThe major number of the OS.2-bytes / WORD
Minor OS VersionThe minor number of the OS.2-bytes / WORD
Major Image VersionMajor number of the Image.2-bytes / WORD
Minor Image VersionMinor number of the Image.2-bytes / WORD
Major Subsystem VersionMajor subsystem version.2-bytes / WORD
Minor Subsystem VersionMinor subsystem version.2-bytes / WORD
Win 32 VersionThis is a reserved set of 4-bytes and will be set to 0 by default.4-bytes / DWORD
Size of ImageSize of the image in bytes when loaded into memory.4-bytes / DWORD
Size of HeadersSize of the MS-DOS Header, Optional Header etc.4-bytes / DWORD
ChecksumUsed to check the integrity of the file. If marked as 0 then now checksum validation is performed.4-bytes / DWORD
SubsystemIndicates on what type of program it is (e.g. console or GUI).2-byte / WORD
DLL CharacteristicsThis is a bitfield of characteristics.2-bytes / WORD
Size of Stack ReserveSize of the reserved stack.4-bytes / DWORD
Size of Stack CommitSize of the committed stack.4-bytes / DWORD
Size of Heap ReserveSize of the reserved heap.4-bytes / DWORD
Size of Heap CommitSize of the committed heap.4-bytes / DWORD
Loader FlagsReserved set of 4-bytes, will be set to 0.4-bytes / DWORD
Number of RVA and SizesNumber of data-directory entries.4-bytes / DWORD
Data DirectoriesA table of data directory information (contains the relative virtual address to the image base and size for each).Structure containing 8-bytes per directory.
Table of contents for the Optional Header

Section Table

A good way to think of this section is to take notice of the word “table” as it is essentially a table of different types of sections (we mentioned these sections in the PE Header part of this blog post).

Each entry within this table will hold valuable information on each section. Here is an example of one of the sections:

NameName of the section e.g. .text.8-bytes / QWORD
Virtual SizeSize of the section once it is loaded into memory (runtime).4-bytes / DWORD
Virtual AddressThis dictates where the section should be loaded into memory when the executable has been opened.4-bytes / DWORD
Size of Raw DataThis is the size of the section’s data on disk (not being run).4-bytes / DWORD
Pointer to Raw DataThis is the address of where you can find the raw data mentioned above.4-bytes / DWORD
Pointer to RelocationsIf the section has relocations, it will be the address of the relocation table; if not then it will be marked with 0.4-bytes / DWORD
Pointer to Line NumbersSimilar to the above, it points to the line table (this is usually used for debugging and typically won’t seen for release versions of software).4-bytes / DWORD
Number of RelocationsThis contains the number of entries in the relocation table.2-bytes / WORD
Number of Line NumbersThe number of entries in the line table for the portable executable format.2-bytes / WORD
CharacteristicsThis is a 4-byte section with each bit representing a certain characteristic (not including alignments).4-bytes / DWORD
Contents of the Section Table.

Portable Executable Format Conclusion

This topic can indeed become quite complex and fascinating. However, my goal in this blog post is to present it in a way that makes it approachable for individuals who may not have prior experience with reverse engineering or related fields. I hope I’ve successfully achieved this aim.

If you are interested in learning more about this subject, here are some good books and websites to learn from:

One response

  1. […] out, particularly for simpler programs, I’m currently using it on a project related to my Portable Executable Format blog post. For larger, complex applications, CLI11 often takes precedence. However, this post aims […]

Leave a Reply

Your email address will not be published. Required fields are marked *