DEX File Dissection Lab: A Hands-On Guide to Android Executable Format

Introduction to the Android Executable (DEX) Format

The Android operating system relies on the Dalvik Executable (DEX) format to run applications. Unlike traditional Java bytecode (.class files), DEX files are optimized for efficiency and minimal memory footprint on resource-constrained mobile devices. They contain bytecode for the Dalvik virtual machine (DVM) or, more recently, the Android Runtime (ART). Understanding the DEX file format is fundamental for anyone involved in Android security analysis, reverse engineering, malware investigation, or even optimizing application performance.

This hands-on guide will take you through the intricate structure of a DEX file. We’ll dissect its various sections, from the crucial header to the core bytecode instructions, providing practical examples and commands to explore these files yourself.

Setting Up Your Lab Environment

Before we dive into the dissection, let’s set up our workspace. You’ll need:

Android SDK Build-Tools: Essential for d8 (or older dx) to convert Java bytecode to DEX, and dexdump to inspect DEX files.
Java Development Kit (JDK): For compiling Java source code.
A Text Editor/IDE: (e.g., VS Code, IntelliJ IDEA) for writing simple Java code.
A Hex Editor: (e.g., HxD, 010 Editor, Bless) for examining raw byte data.

Creating a Simple DEX File

Let’s start by creating a minimal Java application:

// SimpleApp.java
public class SimpleApp {
    public static void main(String[] args) {
        String message = "Hello, DEX World!";
        System.out.println(message);
    }
}

Compile it to Java bytecode:

javac SimpleApp.java

Now, convert the .class file into a DEX file using d8 (part of Android SDK build-tools):

d8 --output output.dex SimpleApp.class

You should now have an output.dex file ready for inspection.

The DEX File Header: Your First Stop

Every DEX file begins with a header, acting as a table of contents, providing essential metadata and offsets to all other sections within the file. It’s the entry point for the DVM/ART to understand the file’s layout.

Using dexdump with the -h (header) option, we can quickly view this information:

dexdump -h output.dex

The output will resemble:

Header:
  magic               : 64 65 78 0a 30 33 35 00  (dex.035.)
  checksum            : d0857321
  signature           : 63e00781... (truncated)
  file_size           : 00000a64
  header_size         : 00000070
  endian_tag          : 12345678
  link_size           : 00000000
  link_off            : 00000000 (00000000)
  map_off             : 00000188 (00000188)
  string_ids_size     : 00000011
  string_ids_off      : 00000070 (00000070)
  type_ids_size       : 00000008
  type_ids_off        : 000000b4 (000000b4)
  proto_ids_size      : 00000005
  proto_ids_off       : 000000d4 (000000d4)
  field_ids_size      : 00000001
  field_ids_off       : 00000104 (00000104)
  method_ids_size     : 00000005
  method_ids_off      : 0000010c (0000010c)
  class_defs_size     : 00000001
  class_defs_off      : 00000134 (00000134)
  data_size           : 00000808
  data_off            : 00000188 (00000188)

Key fields to note:

magic: Identifies the file as a DEX file and indicates the version (e.g., dex.035).
checksum and signature: Used for integrity verification.
file_size: The total size of the DEX file in bytes.
header_size: The size of the header itself (always 0x70 or 112 bytes).
endian_tag: Indicates the endianness of the file (usually 0x12345678 for little-endian).
map_off: Offset to the `map_list` structure, which describes the layout of the entire DEX file.
string_ids_size, string_ids_off: Number of strings and their starting offset.
Similarly, fields for type_ids, proto_ids, field_ids, method_ids, and class_defs provide counts and offsets to their respective sections.
data_size, data_off: The size and offset of the data section, which holds the actual bytecode and other variable-length structures.

Open output.dex in a hex editor. You’ll see the magic bytes (64 65 78 0A 30 33 35 00) at the very beginning, confirming our file type and version.

Navigating the ID Sections: Strings, Types, Fields, and Methods

These sections contain arrays of IDs that act as indices into other tables, effectively forming a giant lookup mechanism for various elements in your application.

`string_ids` and the String Data Pool

The string_ids section is an array of uint offsets. Each offset points to a string_data_item in the data section. A string_data_item starts with a ULEB128-encoded length, followed by the UTF-8 encoded string data, terminated by a null byte.

dexdump -d output.dex will show the decompiled output, including string references. Look for lines like:

  #0              : 'Hello, DEX World!'

This indicates an entry in the string table.

`type_ids`: Describing Classes and Primitives

The type_ids section is an array of uint indices. Each index points into the string_ids array, representing a type descriptor. For example, Ljava/lang/String; describes the String class, I for an integer, or [Ljava/lang/Object; for an array of Objects.

`proto_ids`: Method Prototypes

This section defines method prototypes. Each proto_id_item contains:

shorty_idx: An index into string_ids representing the

Android Mobile Specs & Compare Directory

Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!
Compare Devices Specs →

Introduction to the Android Executable (DEX) Format

Setting Up Your Lab Environment

Creating a Simple DEX File

The DEX File Header: Your First Stop

Navigating the ID Sections: Strings, Types, Fields, and Methods

string_ids and the String Data Pool

type_ids: Describing Classes and Primitives

proto_ids: Method Prototypes

Android Mobile Specs & Compare Directory

Related Technical Guides

Malware Dissection with Smali: A Deep Dive into Android APT & Ransomware Techniques

AIDL Reconstruction Lab: Reversing Android Service Interfaces from Compiled Apps

JNI & Smali Nexus: Reverse Engineering Native Code Interactions in Android Binaries

`string_ids` and the String Data Pool

`type_ids`: Describing Classes and Primitives

`proto_ids`: Method Prototypes