Android Software Reverse Engineering & Decompilation

Build Your Own resources.arsc Parser: Python Tutorial for Custom Android Asset Extraction

Google AdSense Native Placement - Horizontal Top-Post banner

Introduction to Android’s resources.arsc and Reverse Engineering

The resources.arsc file is a critical component within any Android Application Package (APK). It serves as a binary table mapping resource IDs to their actual values, such as strings, layouts, drawables, and more, across different configurations (languages, screen densities, etc.). While tools like apktool provide excellent capabilities for decompiling and recompiling APKs, understanding the underlying resources.arsc format and building a custom parser offers unparalleled insight into an app’s internal structure, enabling advanced reverse engineering, targeted asset extraction, or even vulnerability research.

This tutorial will guide you through the process of developing a Python-based parser for the resources.arsc file. We’ll explore its intricate binary format, demystify its chunk-based structure, and provide practical Python code examples to extract valuable information.

Understanding the resources.arsc Binary Format

At its core, resources.arsc is a binary file composed of a series of self-describing ‘chunks’. Each chunk begins with a ResChunk_header, which specifies its type, header size, and the total size of the chunk. This hierarchical structure allows for extensibility and efficient parsing.

Key Chunk Types

  • ResTable_header (0x0002): The very first chunk in the file. It defines the number of packages contained within the .arsc file.
  • ResStringPool_header (0x0001): Represents a pool of strings. There are typically two main string pools: the global string pool (for resource values) and package-specific string pools (for resource names/keys).
  • ResTable_package (0x0200): Encapsulates resources belonging to a specific Android package (e.g., com.example.app). Each package has its own string pools for resource type names and entry names.
  • ResTable_typeSpec (0x0202): Defines metadata for a specific resource type (e.g., ‘string’, ‘layout’, ‘drawable’). It contains flags for each entry, indicating whether that entry exists for a given configuration.
  • ResTable_type (0x0201): Contains the actual resource entries for a specific type and configuration. This is where resource values (or references to them) are stored.
  • ResTable_config (0x0180): Describes the device configuration (language, screen density, orientation, etc.) that a particular ResTable_type chunk applies to.

The ResChunk_header Structure

Every chunk starts with this 8-byte structure:

typedef struct { uint16_t type; uint16_t headerSize; uint32_t chunkSize;} ResChunk_header;

Parsing Strategy with Python’s `struct` Module

Python’s struct module is indispensable for binary parsing. It allows us to pack and unpack binary data into Python data types. We’ll read the resources.arsc file byte by byte, using `struct.unpack` to interpret the raw bytes according to the defined C structures.

Step 1: Reading the Global Header and String Pool

First, we open the resources.arsc file in binary read mode. We’ll define some constants for chunk types.

import structimport osclass ChunkType:    RES_NULL_TYPE = 0x0000    RES_STRING_POOL_TYPE = 0x0001    RES_TABLE_TYPE = 0x0002    RES_XML_TYPE = 0x0003    RES_TABLE_PACKAGE_TYPE = 0x0200    RES_TABLE_TYPE_SPEC_TYPE = 0x0202    RES_TABLE_TYPE_TYPE = 0x0201    RES_TABLE_CONFIG_TYPE = 0x0180def parse_arsc(filepath):    with open(filepath, 'rb') as f:        # Read ResTable_header        chunk_header = f.read(8)        header_type, header_size, chunk_size = struct.unpack('<HHL', chunk_header)        print(f"File Type: {header_type:#x}, Header Size: {header_size}, Total Size: {chunk_size}")        if header_type != ChunkType.RES_TABLE_TYPE:            raise ValueError("Not a valid resources.arsc file (expected ResTable_header)")        package_count = struct.unpack('<L', f.read(4))[0]        print(f"Package Count: {package_count}")        # After ResTable_header, comes the global string pool        f.seek(header_size) # Ensure we are at the end of the header        string_pool_header_data = f.read(12) # Only the fixed part of ResStringPool_header        pool_header_type, pool_header_size, pool_chunk_size = struct.unpack('<HHL', string_pool_header_data)        if pool_header_type != ChunkType.RES_STRING_POOL_TYPE:            raise ValueError("Expected Global String Pool after ResTable_header")        # Read the rest of ResStringPool_header        string_pool_data = f.read(16) # stringCount, styleCount, flags, stringsStart, stylesStart        string_count, style_count, flags, strings_start, styles_start = struct.unpack('<LLLLL', string_pool_data)        print(f"Global String Pool: Strings={string_count}, Styles={style_count}, Flags={flags}, StringsStart={strings_start}, StylesStart={styles_start}")        # Read the string data based on offsets        current_pos = f.tell()        string_offsets = []        for _ in range(string_count):            string_offsets.append(struct.unpack('<L', f.read(4))[0])        string_pool_strings = []        for offset in string_offsets:            f.seek(current_pos + strings_start + offset) # Position to the actual string data            # Read string length (UTF-16)            length_bytes = f.read(2) # Potentially two bytes for length            length = struct.unpack('<H', length_bytes)[0]            # Handle potential two-byte length for very long strings            if length & 0x8000: # High bit set means length is 2 bytes                length = ((length & 0x7FFF) << 8) | struct.unpack('<B', f.read(1))[0]            string_bytes = f.read(length * 2) # UTF-16 characters            s = string_bytes.decode('utf-16-le')            string_pool_strings.append(s)            f.read(2) # Null terminator        print("n--- Global String Pool Contents ---")        for i, s in enumerate(string_pool_strings):            print(f"{i}: {s}")        return f.tell(), package_count, string_pool_strings

In this initial step, we’ve parsed the main ResTable_header and the crucial global ResStringPool_header. The global string pool often contains values for resources that are simple strings.

Step 2: Iterating Through Packages and Resource Types

After the global string pool, the file contains one or more ResTable_package chunks. Each package has its own string pools for type names and key names, which are essential for mapping resource IDs to human-readable names.

def parse_string_pool(f, base_offset):    pool_header_data = f.read(8) # Read fixed header    pool_type, pool_header_size, pool_chunk_size = struct.unpack('<HHL', pool_header_data)    if pool_type != ChunkType.RES_STRING_POOL_TYPE:        f.seek(base_offset + pool_chunk_size) # Skip invalid chunk        return [], pool_chunk_size    string_pool_data = f.read(16) # stringCount, styleCount, flags, stringsStart, stylesStart    string_count, style_count, flags, strings_start, styles_start = struct.unpack('<LLLLL', string_pool_data)    current_pos_after_header = f.tell()    string_offsets = []    for _ in range(string_count):        string_offsets.append(struct.unpack(' 0:        f.seek(current_pos_after_header + (string_count * 4) + (style_count * 4))    string_pool_strings = []    for offset in string_offsets:        f.seek(base_offset + strings_start + offset)        length_bytes = f.read(2)        length = struct.unpack('<H', length_bytes)[0]        if length & 0x8000:            length = ((length & 0x7FFF) << 8) | struct.unpack('<B', f.read(1))[0]        string_bytes = f.read(length * 2)        s = string_bytes.decode('utf-16-le', errors='ignore')        string_pool_strings.append(s)        f.read(2) # Null terminator    f.seek(base_offset + pool_chunk_size) # Move past the entire string pool chunk    return string_pool_strings, pool_chunk_size # Return parsed strings and chunk size to advance cursor# ... (inside parse_arsc function, after global string pool parsing)current_offset = f.tell()for i in range(package_count):    f.seek(current_offset)    package_header_data = f.read(8)    package_type, package_header_size, package_chunk_size = struct.unpack('<HHL', package_header_data)    if package_type != ChunkType.RES_TABLE_PACKAGE_TYPE:        raise ValueError(f"Expected ResTable_package, got {package_type:#x}")    package_id, package_name_bytes, type_strings_offset, key_strings_offset = struct.unpack('<L256sLL', f.read(268))    package_name = package_name_bytes.decode('utf-16-le').split('x00')[0]    print(f"n--- Package {i+1}: {package_name} (ID: {package_id}) ---")    current_offset = f.tell()    # Parse Type String Pool    f.seek(current_offset)    type_strings, ts_chunk_size = parse_string_pool(f, current_offset)    current_offset += ts_chunk_size    print(f"Type Strings ({len(type_strings)}): {type_strings[:5]}...")    # Parse Key String Pool    f.seek(current_offset)    key_strings, ks_chunk_size = parse_string_pool(f, current_offset)    current_offset += ks_chunk_size    print(f"Key Strings ({len(key_strings)}): {key_strings[:5]}...")    # Iterate through ResTable_typeSpec and ResTable_type chunks    while f.tell() < current_offset + package_chunk_size - (ts_chunk_size + ks_chunk_size): # Iterate until end of package chunk        chunk_pos = f.tell()        chunk_header_data = f.read(8)        chunk_type, chunk_header_size, chunk_size = struct.unpack('<HHL', chunk_header_data)        if chunk_type == ChunkType.RES_TABLE_TYPE_SPEC_TYPE:            type_spec_id, entry_count, _ = struct.unpack('<BL3s', f.read(8)) # 3 bytes padding            print(f"n  Type Spec (ID: {type_spec_id}, Count: {entry_count})")            # Read entry flags (each is 4 bytes, entry_count of them)            f.seek(chunk_pos + chunk_size) # Skip flags for now        elif chunk_type == ChunkType.RES_TABLE_TYPE_TYPE:            type_id, entry_count, config_offset = struct.unpack('<BLHH', f.read(8)) # type_id, entry_count, config_offset            print(f"  Type (ID: {type_id}, Count: {entry_count})")            # Read ResTable_config            config_start = f.tell()            config_data = f.read(28) # config_size, mcc, mnc, locale, screen_type, etc. (simplified)            # For a full parser, you'd unpack this config_data more deeply            f.seek(config_start + (config_offset - 28)) # Seek to where config ends and entries start            entry_offsets = []            for _ in range(entry_count):                entry_offsets.append(struct.unpack('<L', f.read(4))[0])            # Read resource entries            for entry_idx, offset in enumerate(entry_offsets):                if offset == 0xFFFFFFFF: # If entry doesn't exist for this config                    continue                f.seek(chunk_pos + chunk_header_size + config_offset + offset)                # ResTable_entry structure                entry_flags, key_string_idx = struct.unpack('<HL', f.read(6))                entry_name = key_strings[key_string_idx]                # Res_value structure                value_size, value_res0, value_data_type, value_data = struct.unpack('<HBB L', f.read(8))                value_str = ""                if value_data_type == 0x03: # String type (reference to global string pool)                    value_str = global_string_pool[value_data]                elif value_data_type == 0x10: # Integer                    value_str = str(value_data)                elif value_data_type == 0x01: # Attribute (reference to an attribute)                    value_str = f"Attr Ref: {value_data:#x}"                elif value_data_type == 0x12: # Boolean                    value_str = "true" if value_data == 0xFFFFFFFF else "false"                elif value_data_type == 0x1C: # Color                    value_str = f"Color: {value_data:#x}"                else:                    value_str = f"Raw Data: {value_data:#x} (Type: {value_data_type:#x})"                type_name = type_strings[type_id - 1] # Type IDs are 1-based                print(f"    [{type_name}/{entry_name}] = {value_str}")            f.seek(chunk_pos + chunk_size) # Move past the entire type chunk        else:            print(f"  Unknown chunk type: {chunk_type:#x} at {hex(chunk_pos)}")            f.seek(chunk_pos + chunk_size) # Skip unknown chunk    current_offset = f.tell()# Example Usage:parse_arsc('path/to/your/resources.arsc')

Step 3: Extracting Resources and Assets

The code above demonstrates how to read resource IDs and their associated values (strings, integers, etc.). For actual assets like drawables, layouts, or raw files, the resources.arsc file primarily provides references. The values for such resources often point to a file path or an offset within another file (like resources.zip in some older APKs, or directly within the APK’s assets/ or res/ directories).

When a resource value is a reference (e.g., to a drawable), its value_data_type will typically be 0x01 (attribute) or 0x02 (reference). The value_data would then contain the resource ID of the referenced item. To fully extract, you’d need to recursively resolve these references. For files directly embedded in the APK (e.g., a PNG in res/drawable/), the resources.arsc entry often provides the name, which you can then use to locate the file within the APK structure (which is essentially a ZIP archive).

For example, if you parse a string resource like "@drawable/my_icon", your parser would first identify this as a string. A more advanced parser would then recognize the @drawable/ prefix, resolve my_icon to its resource ID using the parsed type and key string pools, and then locate the actual my_icon.png or my_icon.xml file within the APK’s res/drawable/ directory.

Conclusion

Building a custom resources.arsc parser, even a simplified one, provides a profound understanding of how Android applications manage their assets. This expert-level tutorial has equipped you with the foundational knowledge and Python code to start interpreting this complex binary format. From here, you can extend your parser to support more data types, fully resolve references, integrate with APK parsing to extract actual files, or even experiment with modifying the .arsc file for advanced reverse engineering or research purposes. The world of Android reverse engineering is vast, and mastering file formats like resources.arsc is a crucial step towards unlocking its secrets.

Android Mobile Specs & Compare Directory

Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!

Compare Devices Specs →
Google AdSense Inline Placement - Content Footer banner