Introduction to the Android Executable (DEX) Format
The Android operating system relies on the Dalvik Executable (DEX) format to run applications. Unlike traditional Java bytecode (.class files), DEX files are optimized for efficiency and minimal memory footprint on resource-constrained mobile devices. They contain bytecode for the Dalvik virtual machine (DVM) or, more recently, the Android Runtime (ART). Understanding the DEX file format is fundamental for anyone involved in Android security analysis, reverse engineering, malware investigation, or even optimizing application performance.
This hands-on guide will take you through the intricate structure of a DEX file. We’ll dissect its various sections, from the crucial header to the core bytecode instructions, providing practical examples and commands to explore these files yourself.
Setting Up Your Lab Environment
Before we dive into the dissection, let’s set up our workspace. You’ll need:
- Android SDK Build-Tools: Essential for
d8(or olderdx) to convert Java bytecode to DEX, anddexdumpto inspect DEX files. - Java Development Kit (JDK): For compiling Java source code.
- A Text Editor/IDE: (e.g., VS Code, IntelliJ IDEA) for writing simple Java code.
- A Hex Editor: (e.g., HxD, 010 Editor, Bless) for examining raw byte data.
Creating a Simple DEX File
Let’s start by creating a minimal Java application:
// SimpleApp.java
public class SimpleApp {
public static void main(String[] args) {
String message = "Hello, DEX World!";
System.out.println(message);
}
}
Compile it to Java bytecode:
javac SimpleApp.java
Now, convert the .class file into a DEX file using d8 (part of Android SDK build-tools):
d8 --output output.dex SimpleApp.class
You should now have an output.dex file ready for inspection.
The DEX File Header: Your First Stop
Every DEX file begins with a header, acting as a table of contents, providing essential metadata and offsets to all other sections within the file. It’s the entry point for the DVM/ART to understand the file’s layout.
Using dexdump with the -h (header) option, we can quickly view this information:
dexdump -h output.dex
The output will resemble:
Header:
magic : 64 65 78 0a 30 33 35 00 (dex.035.)
checksum : d0857321
signature : 63e00781... (truncated)
file_size : 00000a64
header_size : 00000070
endian_tag : 12345678
link_size : 00000000
link_off : 00000000 (00000000)
map_off : 00000188 (00000188)
string_ids_size : 00000011
string_ids_off : 00000070 (00000070)
type_ids_size : 00000008
type_ids_off : 000000b4 (000000b4)
proto_ids_size : 00000005
proto_ids_off : 000000d4 (000000d4)
field_ids_size : 00000001
field_ids_off : 00000104 (00000104)
method_ids_size : 00000005
method_ids_off : 0000010c (0000010c)
class_defs_size : 00000001
class_defs_off : 00000134 (00000134)
data_size : 00000808
data_off : 00000188 (00000188)
Key fields to note:
magic: Identifies the file as a DEX file and indicates the version (e.g.,dex.035).checksumandsignature: Used for integrity verification.file_size: The total size of the DEX file in bytes.header_size: The size of the header itself (always0x70or 112 bytes).endian_tag: Indicates the endianness of the file (usually0x12345678for little-endian).map_off: Offset to the `map_list` structure, which describes the layout of the entire DEX file.string_ids_size,string_ids_off: Number of strings and their starting offset.- Similarly, fields for
type_ids,proto_ids,field_ids,method_ids, andclass_defsprovide counts and offsets to their respective sections. data_size,data_off: The size and offset of the data section, which holds the actual bytecode and other variable-length structures.
Open output.dex in a hex editor. You’ll see the magic bytes (64 65 78 0A 30 33 35 00) at the very beginning, confirming our file type and version.
Navigating the ID Sections: Strings, Types, Fields, and Methods
These sections contain arrays of IDs that act as indices into other tables, effectively forming a giant lookup mechanism for various elements in your application.
string_ids and the String Data Pool
The string_ids section is an array of uint offsets. Each offset points to a string_data_item in the data section. A string_data_item starts with a ULEB128-encoded length, followed by the UTF-8 encoded string data, terminated by a null byte.
dexdump -d output.dex will show the decompiled output, including string references. Look for lines like:
#0 : 'Hello, DEX World!'
This indicates an entry in the string table.
type_ids: Describing Classes and Primitives
The type_ids section is an array of uint indices. Each index points into the string_ids array, representing a type descriptor. For example, Ljava/lang/String; describes the String class, I for an integer, or [Ljava/lang/Object; for an array of Objects.
proto_ids: Method Prototypes
This section defines method prototypes. Each proto_id_item contains:
shorty_idx: An index intostring_idsrepresenting the
Android Mobile Specs & Compare Directory
Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!
Compare Devices Specs →