Introduction: Unpacking Android’s Binary Resource Table
Android applications bundle a vast array of resources, from layout definitions and string literals to images and raw data. While the Android Asset Packaging Tool (AAPT) is the standard utility for compiling and inspecting these resources during development, its capabilities for deep, programmatic reverse engineering are limited. When engaging in advanced security analysis, malware research, or complex application reconstruction, a deeper understanding and direct parsing of the resources.arsc file become essential. This binary resource table is the heart of Android’s resource management system, mapping numerical resource IDs to actual values and paths. This article delves into the intricate structure of resources.arsc, demonstrating how to programmatically extract and map resources, far beyond what AAPT offers.
The Core Structure of resources.arsc
The resources.arsc file is fundamentally a sequence of nested binary chunks. Each chunk begins with a ResChunk_header, providing its type, header size, and total chunk size. Understanding these headers is crucial for navigating the file. The primary chunks you’ll encounter are:
ResTable_header: The root chunk, defining the total number of packages.ResStringPool_header: Contains global strings for resource names, values, and other textual data.ResTable_package: Represents a single Android package (APK). Multiple packages can exist for framework resources.ResTable_typeSpec: Defines the configurations available for a given resource type (e.g., string, drawable).ResTable_type: Holds the actual entries for a specific resource type and configuration.Res_value: The final structure containing the resource’s actual data or a reference to it.
Parsing Fundamentals: Reading Chunks
At a low level, parsing resources.arsc involves reading bytes and interpreting them according to the defined structures. Let’s outline the initial steps for reading the main table header and the global string pool.
import struct # Python's struct module for binary data handlingdef parse_resource_arsc(file_path): with open(file_path, 'rb') as f: # Read ResTable_header chunk_type, header_size, chunk_size, package_count = struct.unpack('<HHII', f.read(12)) print(f"ResTable Header: Type={hex(chunk_type)}, Header Size={header_size}, Chunk Size={chunk_size}, Package Count={package_count}") if chunk_type != 0x0002: # RES_TABLE_TYPE raise ValueError("Invalid ResTable_header type") # Read global String Pool header (immediately follows ResTable_header) # Structure: type, headerSize, chunkSize, stringCount, styleCount, flags, stringsStart, stylesStart string_pool_header_data = struct.unpack('<HHIIIIII', f.read(28)) string_pool_type = string_pool_header_data[0] string_pool_header_size = string_pool_header_data[1] string_pool_chunk_size = string_pool_header_data[2] string_count = string_pool_header_data[3] strings_start_offset = string_pool_header_data[6] print(f"String Pool Header: Type={hex(string_pool_type)}, String Count={string_count}, Strings Start={strings_start_offset}") # ... proceed to parse string pool data and packages
Diving into Packages and Type Specifications
After the global string pool, the file contains one or more ResTable_package chunks. Each package represents a set of resources. Within a package, resources are organized by type (e.g., string, drawable, layout) and configuration (e.g., language, screen density).
A ResTable_package chunk contains:
- Its own
ResChunk_header. - A unique package ID.
- The package name (a fixed-size UTF-16 string).
- Offsets to its own string pools:
typeStrings(for resource type names) andkeyStrings(for resource entry names).
Following a ResTable_package header, you’ll find a sequence of ResTable_typeSpec and ResTable_type chunks.
-
ResTable_typeSpec(Type Specification): This chunk defines a particular resource type (e.g.,string,drawable) and specifies the number of resource entries for that type. It also contains an array of 32-bit integers, where each bit indicates if a configuration exists for the corresponding resource entry ID. This is critical for knowing which resources are defined across different configurations. -
ResTable_type(Type Information): Immediately after aResTable_typeSpeccomes one or moreResTable_typechunks. EachResTable_typecorresponds to a specific configuration (e.g.,en-US,hdpi) for the resource type defined by the precedingTypeSpec. It contains an array of 32-bit offsets, pointing toRes_valuestructures for each resource entry. A0xFFFFFFFFoffset indicates a missing resource for that configuration.# Inside a ResTable_package parsing loop (conceptual)def parse_package(f): # ... read package header ... # Read type strings pool type_strings = parse_string_pool(f, type_strings_offset) # Read key strings pool key_strings = parse_string_pool(f, key_strings_offset) while f.tell() < current_package_end_offset: chunk_header = read_chunk_header(f) if chunk_header.type == 0x0202: # RES_TABLE_TYPE_SPEC_TYPE parse_type_spec(f, chunk_header, type_strings, key_strings) elif chunk_header.type == 0x0201: # RES_TABLE_TYPE_TYPE parse_type(f, chunk_header, key_strings) else: # Skip unknown chunk f.seek(chunk_header.chunkSize - chunk_header.headerSize, 1)
Resource Entry Deep Dive: The Res_value Structure
The Res_value structure is where the rubber meets the road. This small structure holds the actual data for a resource. It includes:
size(uint16): Size of the structure.res0(uint8): Always 0.dataType(uint8): Indicates the type of data stored (e.g., string, integer, reference, dimension).data(uint32): The actual value, or an index into a string pool, or a resource ID reference.
For example, if dataType is RES_STRING, data will be an index into the relevant string pool (either the global pool or a package’s key/type string pool). If dataType is RES_REFERENCE, data will be another resource ID (e.g., a style referencing a color resource).
# Conceptual code snippet for parsing Res_value structuredef parse_res_value(f, string_pool): value_size, res0, data_type, data = struct.unpack('<HBB I', f.read(8)) if data_type == 0x03: # RES_STRING_POOL_REF (string index) return string_pool.get_string(data) elif data_type == 0x01: # RES_REFERENCE (resource ID) return f"@0x{data:08x}" # ... handle other data_types (int, bool, color, etc.) ... else: return data # raw data for other types
Programmatic Asset Recovery and ID Mapping
The ultimate goal of this deep parsing is to create a comprehensive map of resource IDs to their actual names and values. This mapping allows tools to reconstruct resource files, identify specific assets, or even inject custom values for dynamic analysis. For instance, to map a drawable ID like 0x7f08001a:
- The first byte (
0x7f) indicates the package ID. - The second byte (
0x08) indicates the resource type ID (e.g.,drawable). - The last two bytes (
0x001a) represent the entry ID within that type.
By iterating through the `ResTable_package`, `ResTable_typeSpec`, and `ResTable_type` chunks, you can build a lookup table. The `keyStrings` pool within each package provides the human-readable names corresponding to the entry IDs. Once you have the entry’s name (e.g., icon_launcher), and know it’s a `drawable` type, you can then attempt to locate the actual asset file (e.g., `res/drawable-hdpi/icon_launcher.png`) within the APK structure.
The Challenge of Asset Correlation
While `resources.arsc` provides the mapping, it doesn’t always contain the raw asset data directly (especially for images, audio, etc.). Instead, it often stores references or file paths. For drawables, `resources.arsc` maps the ID to a file name. To recover the actual image, you must:
- Parse `resources.arsc` to get the resource type (e.g., `drawable`), its name (e.g., `my_image`), and its configuration (e.g., `hdpi`).
- Locate the corresponding file within the APK’s `res` directory (e.g., `res/drawable-hdpi/my_image.png`).
- Extract that file.
This process requires a full APK parsing solution, where `resources.arsc` acts as the blueprint for understanding and organizing the `res` directory’s contents. For raw assets in the `assets` directory, `resources.arsc` usually contains string paths referencing these files.
Conclusion
Programmatic parsing of `resources.arsc` is a powerful technique for Android reverse engineers and security analysts. It offers unparalleled depth into an application’s resource landscape, enabling custom tools for asset extraction, ID mapping, and even resource manipulation. By understanding the binary chunk structure and the interplay between `ResTable_header`, `ResStringPool_header`, `ResTable_package`, `ResTable_typeSpec`, `ResTable_type`, and `Res_value`, you can unlock a wealth of information inaccessible through conventional means, paving the way for more sophisticated analysis and reconstruction efforts.
Android Mobile Specs & Compare Directory
Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!
Compare Devices Specs →