Introduction to Android’s resources.arsc and Reverse Engineering
The resources.arsc file is a critical component within any Android Application Package (APK). It serves as a binary table mapping resource IDs to their actual values, such as strings, layouts, drawables, and more, across different configurations (languages, screen densities, etc.). While tools like apktool provide excellent capabilities for decompiling and recompiling APKs, understanding the underlying resources.arsc format and building a custom parser offers unparalleled insight into an app’s internal structure, enabling advanced reverse engineering, targeted asset extraction, or even vulnerability research.
This tutorial will guide you through the process of developing a Python-based parser for the resources.arsc file. We’ll explore its intricate binary format, demystify its chunk-based structure, and provide practical Python code examples to extract valuable information.
Understanding the resources.arsc Binary Format
At its core, resources.arsc is a binary file composed of a series of self-describing ‘chunks’. Each chunk begins with a ResChunk_header, which specifies its type, header size, and the total size of the chunk. This hierarchical structure allows for extensibility and efficient parsing.
Key Chunk Types
ResTable_header(0x0002): The very first chunk in the file. It defines the number of packages contained within the.arscfile.ResStringPool_header(0x0001): Represents a pool of strings. There are typically two main string pools: the global string pool (for resource values) and package-specific string pools (for resource names/keys).ResTable_package(0x0200): Encapsulates resources belonging to a specific Android package (e.g.,com.example.app). Each package has its own string pools for resource type names and entry names.ResTable_typeSpec(0x0202): Defines metadata for a specific resource type (e.g., ‘string’, ‘layout’, ‘drawable’). It contains flags for each entry, indicating whether that entry exists for a given configuration.ResTable_type(0x0201): Contains the actual resource entries for a specific type and configuration. This is where resource values (or references to them) are stored.ResTable_config(0x0180): Describes the device configuration (language, screen density, orientation, etc.) that a particularResTable_typechunk applies to.
The ResChunk_header Structure
Every chunk starts with this 8-byte structure:
typedef struct { uint16_t type; uint16_t headerSize; uint32_t chunkSize;} ResChunk_header;
Parsing Strategy with Python’s `struct` Module
Python’s struct module is indispensable for binary parsing. It allows us to pack and unpack binary data into Python data types. We’ll read the resources.arsc file byte by byte, using `struct.unpack` to interpret the raw bytes according to the defined C structures.
Step 1: Reading the Global Header and String Pool
First, we open the resources.arsc file in binary read mode. We’ll define some constants for chunk types.
import structimport osclass ChunkType: RES_NULL_TYPE = 0x0000 RES_STRING_POOL_TYPE = 0x0001 RES_TABLE_TYPE = 0x0002 RES_XML_TYPE = 0x0003 RES_TABLE_PACKAGE_TYPE = 0x0200 RES_TABLE_TYPE_SPEC_TYPE = 0x0202 RES_TABLE_TYPE_TYPE = 0x0201 RES_TABLE_CONFIG_TYPE = 0x0180def parse_arsc(filepath): with open(filepath, 'rb') as f: # Read ResTable_header chunk_header = f.read(8) header_type, header_size, chunk_size = struct.unpack('<HHL', chunk_header) print(f"File Type: {header_type:#x}, Header Size: {header_size}, Total Size: {chunk_size}") if header_type != ChunkType.RES_TABLE_TYPE: raise ValueError("Not a valid resources.arsc file (expected ResTable_header)") package_count = struct.unpack('<L', f.read(4))[0] print(f"Package Count: {package_count}") # After ResTable_header, comes the global string pool f.seek(header_size) # Ensure we are at the end of the header string_pool_header_data = f.read(12) # Only the fixed part of ResStringPool_header pool_header_type, pool_header_size, pool_chunk_size = struct.unpack('<HHL', string_pool_header_data) if pool_header_type != ChunkType.RES_STRING_POOL_TYPE: raise ValueError("Expected Global String Pool after ResTable_header") # Read the rest of ResStringPool_header string_pool_data = f.read(16) # stringCount, styleCount, flags, stringsStart, stylesStart string_count, style_count, flags, strings_start, styles_start = struct.unpack('<LLLLL', string_pool_data) print(f"Global String Pool: Strings={string_count}, Styles={style_count}, Flags={flags}, StringsStart={strings_start}, StylesStart={styles_start}") # Read the string data based on offsets current_pos = f.tell() string_offsets = [] for _ in range(string_count): string_offsets.append(struct.unpack('<L', f.read(4))[0]) string_pool_strings = [] for offset in string_offsets: f.seek(current_pos + strings_start + offset) # Position to the actual string data # Read string length (UTF-16) length_bytes = f.read(2) # Potentially two bytes for length length = struct.unpack('<H', length_bytes)[0] # Handle potential two-byte length for very long strings if length & 0x8000: # High bit set means length is 2 bytes length = ((length & 0x7FFF) << 8) | struct.unpack('<B', f.read(1))[0] string_bytes = f.read(length * 2) # UTF-16 characters s = string_bytes.decode('utf-16-le') string_pool_strings.append(s) f.read(2) # Null terminator print("n--- Global String Pool Contents ---") for i, s in enumerate(string_pool_strings): print(f"{i}: {s}") return f.tell(), package_count, string_pool_strings
In this initial step, we’ve parsed the main ResTable_header and the crucial global ResStringPool_header. The global string pool often contains values for resources that are simple strings.
Step 2: Iterating Through Packages and Resource Types
After the global string pool, the file contains one or more ResTable_package chunks. Each package has its own string pools for type names and key names, which are essential for mapping resource IDs to human-readable names.
def parse_string_pool(f, base_offset): pool_header_data = f.read(8) # Read fixed header pool_type, pool_header_size, pool_chunk_size = struct.unpack('<HHL', pool_header_data) if pool_type != ChunkType.RES_STRING_POOL_TYPE: f.seek(base_offset + pool_chunk_size) # Skip invalid chunk return [], pool_chunk_size string_pool_data = f.read(16) # stringCount, styleCount, flags, stringsStart, stylesStart string_count, style_count, flags, strings_start, styles_start = struct.unpack('<LLLLL', string_pool_data) current_pos_after_header = f.tell() string_offsets = [] for _ in range(string_count): string_offsets.append(struct.unpack(' 0: f.seek(current_pos_after_header + (string_count * 4) + (style_count * 4)) string_pool_strings = [] for offset in string_offsets: f.seek(base_offset + strings_start + offset) length_bytes = f.read(2) length = struct.unpack('<H', length_bytes)[0] if length & 0x8000: length = ((length & 0x7FFF) << 8) | struct.unpack('<B', f.read(1))[0] string_bytes = f.read(length * 2) s = string_bytes.decode('utf-16-le', errors='ignore') string_pool_strings.append(s) f.read(2) # Null terminator f.seek(base_offset + pool_chunk_size) # Move past the entire string pool chunk return string_pool_strings, pool_chunk_size # Return parsed strings and chunk size to advance cursor# ... (inside parse_arsc function, after global string pool parsing)current_offset = f.tell()for i in range(package_count): f.seek(current_offset) package_header_data = f.read(8) package_type, package_header_size, package_chunk_size = struct.unpack('<HHL', package_header_data) if package_type != ChunkType.RES_TABLE_PACKAGE_TYPE: raise ValueError(f"Expected ResTable_package, got {package_type:#x}") package_id, package_name_bytes, type_strings_offset, key_strings_offset = struct.unpack('<L256sLL', f.read(268)) package_name = package_name_bytes.decode('utf-16-le').split('x00')[0] print(f"n--- Package {i+1}: {package_name} (ID: {package_id}) ---") current_offset = f.tell() # Parse Type String Pool f.seek(current_offset) type_strings, ts_chunk_size = parse_string_pool(f, current_offset) current_offset += ts_chunk_size print(f"Type Strings ({len(type_strings)}): {type_strings[:5]}...") # Parse Key String Pool f.seek(current_offset) key_strings, ks_chunk_size = parse_string_pool(f, current_offset) current_offset += ks_chunk_size print(f"Key Strings ({len(key_strings)}): {key_strings[:5]}...") # Iterate through ResTable_typeSpec and ResTable_type chunks while f.tell() < current_offset + package_chunk_size - (ts_chunk_size + ks_chunk_size): # Iterate until end of package chunk chunk_pos = f.tell() chunk_header_data = f.read(8) chunk_type, chunk_header_size, chunk_size = struct.unpack('<HHL', chunk_header_data) if chunk_type == ChunkType.RES_TABLE_TYPE_SPEC_TYPE: type_spec_id, entry_count, _ = struct.unpack('<BL3s', f.read(8)) # 3 bytes padding print(f"n Type Spec (ID: {type_spec_id}, Count: {entry_count})") # Read entry flags (each is 4 bytes, entry_count of them) f.seek(chunk_pos + chunk_size) # Skip flags for now elif chunk_type == ChunkType.RES_TABLE_TYPE_TYPE: type_id, entry_count, config_offset = struct.unpack('<BLHH', f.read(8)) # type_id, entry_count, config_offset print(f" Type (ID: {type_id}, Count: {entry_count})") # Read ResTable_config config_start = f.tell() config_data = f.read(28) # config_size, mcc, mnc, locale, screen_type, etc. (simplified) # For a full parser, you'd unpack this config_data more deeply f.seek(config_start + (config_offset - 28)) # Seek to where config ends and entries start entry_offsets = [] for _ in range(entry_count): entry_offsets.append(struct.unpack('<L', f.read(4))[0]) # Read resource entries for entry_idx, offset in enumerate(entry_offsets): if offset == 0xFFFFFFFF: # If entry doesn't exist for this config continue f.seek(chunk_pos + chunk_header_size + config_offset + offset) # ResTable_entry structure entry_flags, key_string_idx = struct.unpack('<HL', f.read(6)) entry_name = key_strings[key_string_idx] # Res_value structure value_size, value_res0, value_data_type, value_data = struct.unpack('<HBB L', f.read(8)) value_str = "" if value_data_type == 0x03: # String type (reference to global string pool) value_str = global_string_pool[value_data] elif value_data_type == 0x10: # Integer value_str = str(value_data) elif value_data_type == 0x01: # Attribute (reference to an attribute) value_str = f"Attr Ref: {value_data:#x}" elif value_data_type == 0x12: # Boolean value_str = "true" if value_data == 0xFFFFFFFF else "false" elif value_data_type == 0x1C: # Color value_str = f"Color: {value_data:#x}" else: value_str = f"Raw Data: {value_data:#x} (Type: {value_data_type:#x})" type_name = type_strings[type_id - 1] # Type IDs are 1-based print(f" [{type_name}/{entry_name}] = {value_str}") f.seek(chunk_pos + chunk_size) # Move past the entire type chunk else: print(f" Unknown chunk type: {chunk_type:#x} at {hex(chunk_pos)}") f.seek(chunk_pos + chunk_size) # Skip unknown chunk current_offset = f.tell()# Example Usage:parse_arsc('path/to/your/resources.arsc')
Step 3: Extracting Resources and Assets
The code above demonstrates how to read resource IDs and their associated values (strings, integers, etc.). For actual assets like drawables, layouts, or raw files, the resources.arsc file primarily provides references. The values for such resources often point to a file path or an offset within another file (like resources.zip in some older APKs, or directly within the APK’s assets/ or res/ directories).
When a resource value is a reference (e.g., to a drawable), its value_data_type will typically be 0x01 (attribute) or 0x02 (reference). The value_data would then contain the resource ID of the referenced item. To fully extract, you’d need to recursively resolve these references. For files directly embedded in the APK (e.g., a PNG in res/drawable/), the resources.arsc entry often provides the name, which you can then use to locate the file within the APK structure (which is essentially a ZIP archive).
For example, if you parse a string resource like "@drawable/my_icon", your parser would first identify this as a string. A more advanced parser would then recognize the @drawable/ prefix, resolve my_icon to its resource ID using the parsed type and key string pools, and then locate the actual my_icon.png or my_icon.xml file within the APK’s res/drawable/ directory.
Conclusion
Building a custom resources.arsc parser, even a simplified one, provides a profound understanding of how Android applications manage their assets. This expert-level tutorial has equipped you with the foundational knowledge and Python code to start interpreting this complex binary format. From here, you can extend your parser to support more data types, fully resolve references, integrate with APK parsing to extract actual files, or even experiment with modifying the .arsc file for advanced reverse engineering or research purposes. The world of Android reverse engineering is vast, and mastering file formats like resources.arsc is a crucial step towards unlocking its secrets.
Android Mobile Specs & Compare Directory
Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!
Compare Devices Specs →