Android Software Reverse Engineering & Decompilation

Beyond AAPT: Programmatic Asset Recovery and ID Mapping from resources.arsc

Google AdSense Native Placement - Horizontal Top-Post banner

Introduction: Unpacking Android’s Binary Resource Table

Android applications bundle a vast array of resources, from layout definitions and string literals to images and raw data. While the Android Asset Packaging Tool (AAPT) is the standard utility for compiling and inspecting these resources during development, its capabilities for deep, programmatic reverse engineering are limited. When engaging in advanced security analysis, malware research, or complex application reconstruction, a deeper understanding and direct parsing of the resources.arsc file become essential. This binary resource table is the heart of Android’s resource management system, mapping numerical resource IDs to actual values and paths. This article delves into the intricate structure of resources.arsc, demonstrating how to programmatically extract and map resources, far beyond what AAPT offers.

The Core Structure of resources.arsc

The resources.arsc file is fundamentally a sequence of nested binary chunks. Each chunk begins with a ResChunk_header, providing its type, header size, and total chunk size. Understanding these headers is crucial for navigating the file. The primary chunks you’ll encounter are:

  • ResTable_header: The root chunk, defining the total number of packages.
  • ResStringPool_header: Contains global strings for resource names, values, and other textual data.
  • ResTable_package: Represents a single Android package (APK). Multiple packages can exist for framework resources.
  • ResTable_typeSpec: Defines the configurations available for a given resource type (e.g., string, drawable).
  • ResTable_type: Holds the actual entries for a specific resource type and configuration.
  • Res_value: The final structure containing the resource’s actual data or a reference to it.

Parsing Fundamentals: Reading Chunks

At a low level, parsing resources.arsc involves reading bytes and interpreting them according to the defined structures. Let’s outline the initial steps for reading the main table header and the global string pool.

import struct # Python's struct module for binary data handlingdef parse_resource_arsc(file_path):    with open(file_path, 'rb') as f:        # Read ResTable_header        chunk_type, header_size, chunk_size, package_count = struct.unpack('<HHII', f.read(12))        print(f"ResTable Header: Type={hex(chunk_type)}, Header Size={header_size}, Chunk Size={chunk_size}, Package Count={package_count}")        if chunk_type != 0x0002: # RES_TABLE_TYPE            raise ValueError("Invalid ResTable_header type")        # Read global String Pool header (immediately follows ResTable_header)        # Structure: type, headerSize, chunkSize, stringCount, styleCount, flags, stringsStart, stylesStart        string_pool_header_data = struct.unpack('<HHIIIIII', f.read(28))        string_pool_type = string_pool_header_data[0]        string_pool_header_size = string_pool_header_data[1]        string_pool_chunk_size = string_pool_header_data[2]        string_count = string_pool_header_data[3]        strings_start_offset = string_pool_header_data[6]        print(f"String Pool Header: Type={hex(string_pool_type)}, String Count={string_count}, Strings Start={strings_start_offset}")        # ... proceed to parse string pool data and packages

Diving into Packages and Type Specifications

After the global string pool, the file contains one or more ResTable_package chunks. Each package represents a set of resources. Within a package, resources are organized by type (e.g., string, drawable, layout) and configuration (e.g., language, screen density).

A ResTable_package chunk contains:

  • Its own ResChunk_header.
  • A unique package ID.
  • The package name (a fixed-size UTF-16 string).
  • Offsets to its own string pools: typeStrings (for resource type names) and keyStrings (for resource entry names).

Following a ResTable_package header, you’ll find a sequence of ResTable_typeSpec and ResTable_type chunks.

  • ResTable_typeSpec (Type Specification): This chunk defines a particular resource type (e.g., string, drawable) and specifies the number of resource entries for that type. It also contains an array of 32-bit integers, where each bit indicates if a configuration exists for the corresponding resource entry ID. This is critical for knowing which resources are defined across different configurations.

  • ResTable_type (Type Information): Immediately after a ResTable_typeSpec comes one or more ResTable_type chunks. Each ResTable_type corresponds to a specific configuration (e.g., en-US, hdpi) for the resource type defined by the preceding TypeSpec. It contains an array of 32-bit offsets, pointing to Res_value structures for each resource entry. A 0xFFFFFFFF offset indicates a missing resource for that configuration.

    # Inside a ResTable_package parsing loop (conceptual)def parse_package(f):    # ... read package header ...    # Read type strings pool    type_strings = parse_string_pool(f, type_strings_offset)    # Read key strings pool    key_strings = parse_string_pool(f, key_strings_offset)    while f.tell() < current_package_end_offset:        chunk_header = read_chunk_header(f)        if chunk_header.type == 0x0202: # RES_TABLE_TYPE_SPEC_TYPE            parse_type_spec(f, chunk_header, type_strings, key_strings)        elif chunk_header.type == 0x0201: # RES_TABLE_TYPE_TYPE            parse_type(f, chunk_header, key_strings)        else:            # Skip unknown chunk            f.seek(chunk_header.chunkSize - chunk_header.headerSize, 1)

Resource Entry Deep Dive: The Res_value Structure

The Res_value structure is where the rubber meets the road. This small structure holds the actual data for a resource. It includes:

  • size (uint16): Size of the structure.
  • res0 (uint8): Always 0.
  • dataType (uint8): Indicates the type of data stored (e.g., string, integer, reference, dimension).
  • data (uint32): The actual value, or an index into a string pool, or a resource ID reference.

For example, if dataType is RES_STRING, data will be an index into the relevant string pool (either the global pool or a package’s key/type string pool). If dataType is RES_REFERENCE, data will be another resource ID (e.g., a style referencing a color resource).

# Conceptual code snippet for parsing Res_value structuredef parse_res_value(f, string_pool):    value_size, res0, data_type, data = struct.unpack('<HBB I', f.read(8))    if data_type == 0x03: # RES_STRING_POOL_REF (string index)        return string_pool.get_string(data)    elif data_type == 0x01: # RES_REFERENCE (resource ID)        return f"@0x{data:08x}"    # ... handle other data_types (int, bool, color, etc.) ...    else:        return data # raw data for other types

Programmatic Asset Recovery and ID Mapping

The ultimate goal of this deep parsing is to create a comprehensive map of resource IDs to their actual names and values. This mapping allows tools to reconstruct resource files, identify specific assets, or even inject custom values for dynamic analysis. For instance, to map a drawable ID like 0x7f08001a:

  1. The first byte (0x7f) indicates the package ID.
  2. The second byte (0x08) indicates the resource type ID (e.g., drawable).
  3. The last two bytes (0x001a) represent the entry ID within that type.

By iterating through the `ResTable_package`, `ResTable_typeSpec`, and `ResTable_type` chunks, you can build a lookup table. The `keyStrings` pool within each package provides the human-readable names corresponding to the entry IDs. Once you have the entry’s name (e.g., icon_launcher), and know it’s a `drawable` type, you can then attempt to locate the actual asset file (e.g., `res/drawable-hdpi/icon_launcher.png`) within the APK structure.

The Challenge of Asset Correlation

While `resources.arsc` provides the mapping, it doesn’t always contain the raw asset data directly (especially for images, audio, etc.). Instead, it often stores references or file paths. For drawables, `resources.arsc` maps the ID to a file name. To recover the actual image, you must:

  1. Parse `resources.arsc` to get the resource type (e.g., `drawable`), its name (e.g., `my_image`), and its configuration (e.g., `hdpi`).
  2. Locate the corresponding file within the APK’s `res` directory (e.g., `res/drawable-hdpi/my_image.png`).
  3. Extract that file.

This process requires a full APK parsing solution, where `resources.arsc` acts as the blueprint for understanding and organizing the `res` directory’s contents. For raw assets in the `assets` directory, `resources.arsc` usually contains string paths referencing these files.

Conclusion

Programmatic parsing of `resources.arsc` is a powerful technique for Android reverse engineers and security analysts. It offers unparalleled depth into an application’s resource landscape, enabling custom tools for asset extraction, ID mapping, and even resource manipulation. By understanding the binary chunk structure and the interplay between `ResTable_header`, `ResStringPool_header`, `ResTable_package`, `ResTable_typeSpec`, `ResTable_type`, and `Res_value`, you can unlock a wealth of information inaccessible through conventional means, paving the way for more sophisticated analysis and reconstruction efforts.

Android Mobile Specs & Compare Directory

Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!

Compare Devices Specs →
Google AdSense Inline Placement - Content Footer banner