Analyzing Raw eMMC Dumps: Developing Custom Scripts for Android Data Carving and Artifact Extraction

Introduction to eMMC Forensics and Data Recovery

The forensic analysis of Android devices often involves dealing with raw eMMC (embedded MultiMediaCard) memory dumps, especially in cases where physical access is required due to locked devices, corrupted filesystems, or advanced data recovery scenarios. While commercial tools offer some level of automation, they frequently fall short when dealing with highly fragmented, corrupted, or non-standard filesystem layouts. This is where the development of custom scripts becomes indispensable, allowing investigators to perform deep data carving and artifact extraction directly from the raw binary data.

eMMC chips are the primary storage solution in most Android devices, acting as a combination of NAND flash memory and a flash memory controller. When a chip-off procedure is performed, the eMMC chip is physically removed from the device’s PCB and its contents are read bit-for-bit into a raw binary dump. This dump represents the entire physical storage, irrespective of logical partitions or filesystem structures, presenting both immense opportunities and significant challenges for forensic analysis.

The Challenges of Raw eMMC Dump Analysis

Analyzing a raw eMMC dump is not akin to simply mounting a drive. The challenges are numerous:

Filesystem Corruption: Damaged boot sectors, partition tables, or superblock entries can render standard filesystem tools ineffective.
Unknown Partition Layouts: Many Android devices use non-standard or custom partition schemes not easily recognized by general-purpose forensic tools.
Fragmentation: Files, especially those deleted or stored on highly used devices, can be severely fragmented across the physical memory.
Encryption: Full Disk Encryption (FDE) or File-Based Encryption (FBE) makes data recovery much harder without the decryption key. However, even encrypted dumps can yield valuable metadata or unencrypted fragments.
Proprietary Formats: Some application data might be stored in custom formats requiring specific parsing logic.

Custom scripts address these challenges by enabling byte-level analysis, searching for specific signatures, and reconstructing data based on known patterns rather than relying on an intact filesystem.

Understanding Android Filesystems and Their Signatures

Android devices typically use various filesystems, with the most common being ext4, F2FS (Flash-Friendly Filesystem), and more recently, EROFS (Enhanced Read-Only Filesystem) for system partitions. Understanding their internal structures and key signatures is crucial for data carving.

EXT4: Superblock (magic number 0xEF53), inode structures, directory entries.
F2FS: Superblock, checkpoint pack, segment info blocks.
SQLite Databases: Used extensively by Android for SMS, call logs, contacts, and app data. All SQLite databases start with the ASCII string SQLite format 3 followed by (0x00000000) at offset 16.
JPEG Images: Start with FF D8 FF E0 and end with FF D9.
ZIP/APK Files: Start with PK (0x504B0304).

Developing Custom Scripts for Data Carving

The core principle of data carving is to scan the raw binary data for known file headers (and ideally, footers) and extract the bytes in between. Python is an excellent language for this due to its powerful string and byte manipulation capabilities, combined with libraries for memory mapping and regular expressions.

Step 1: Identifying Partition Boundaries (Initial Scan)

Before deep carving, it’s often useful to identify potential partition boundaries. While `fdisk` might fail, tools like `binwalk` or custom Python scripts can scan for known partition table types (GPT, MBR) or filesystem superblocks.

# Use binwalk for a quick initial scan for known file types and partition info binwalk -E --disable='PR' --disable='RS' --disable='TR' your_emmc_dump.bin # To get a rough idea of embedded files, including potential partition images

For more specific partition table identification (e.g., GPT), you might look for `EFI PART` string:

import mmap import os  def find_gpt_header(dump_path):     with open(dump_path, 'rb') as f:         with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:             # GPT header 'EFI PART' usually at offset 512 (LBA 1)             gpt_signature_offset = mm.find(b'EFI PART', 512)             if gpt_signature_offset == 512:                 print(f"GPT header found at offset {gpt_signature_offset}")                 # You can then parse the GPT header for partition entries                 return True             return False  # Example usage: find_gpt_header('your_emmc_dump.bin')

Step 2: Carving SQLite Databases

SQLite databases are rich sources of forensic data. We can carve them by searching for their unique header `SQLite format 3`.

import mmap import os  def carve_sqlite_dbs(dump_path, output_dir='carved_sqlite'):     if not os.path.exists(output_dir):         os.makedirs(output_dir)      sqlite_header = b'SQLite format 3x00x01x01x00x00x00x00x00x00x00x00x00x00x00x00x00' # Full header including 0x00 at offset 16     db_count = 0      with open(dump_path, 'rb') as f:         with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:             offset = 0             while True:                 offset = mm.find(sqlite_header, offset)                 if offset == -1:                     break                  # SQLite databases typically have page sizes of 1KB, 2KB, 4KB, 8KB, 16KB, 32KB, 64KB.                 # Let's try to extract a large chunk, e.g., 5MB, or look for a more specific size.                 # A heuristic for size might involve reading the page size from the header at offset 16.                 # For simplicity, we'll carve a fixed large chunk for now.                 # The actual size would need to be determined by parsing the header bytes.                 potential_db_size = 5 * 1024 * 1024 # Example: Carve 5MB, adjust as needed or parse page size                 end_offset = min(offset + potential_db_size, len(mm))                  output_filename = os.path.join(output_dir, f'carved_db_{db_count:04d}_{offset}.sqlite')                 with open(output_filename, 'wb') as out_f:                     out_f.write(mm[offset:end_offset])                 print(f"Carved SQLite DB from 0x{offset:x} to 0x{end_offset:x} to {output_filename}")                 db_count += 1                 offset += 512 # Continue searching after the start of the current DB, might overlap but ensures discovery  # Example usage: carve_sqlite_dbs('your_emmc_dump.bin')

After carving, tools like `DB Browser for SQLite` can be used to open and analyze the extracted `.sqlite` files. Even fragmented databases can often yield partial tables.

Step 3: Carving JPEG Images

Images are critical evidence. We look for their Start Of Image (SOI) and End Of Image (EOI) markers.

import mmap import os  def carve_jpegs(dump_path, output_dir='carved_images'):     if not os.path.exists(output_dir):         os.makedirs(output_dir)      jpeg_start_marker = b'xFFxD8xFF'     jpeg_end_marker = b'xFFxD9'     image_count = 0      with open(dump_path, 'rb') as f:         with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:             offset = 0             while True:                 start_index = mm.find(jpeg_start_marker, offset)                 if start_index == -1:                     break                  end_index = mm.find(jpeg_end_marker, start_index + len(jpeg_start_marker))                 if end_index == -1:                     # Could be a fragmented image, or EOI is missing                     print(f"Warning: JPEG start found at 0x{start_index:x} but no EOI found nearby. Skipping.")                     offset = start_index + len(jpeg_start_marker)                     continue                 end_index += len(jpeg_end_marker) # Include the EOI marker                  output_filename = os.path.join(output_dir, f'carved_image_{image_count:04d}_{start_index}.jpg')                 with open(output_filename, 'wb') as out_f:                     out_f.write(mm[start_index:end_index])                 print(f"Carved JPEG from 0x{start_index:x} to 0x{end_index:x} to {output_filename}")                 image_count += 1                 offset = end_index  # Example usage: carve_jpegs('your_emmc_dump.bin')

Step 4: Carving ZIP/APK Files

APK files are essentially ZIP archives. Their header `PK` (0x504B0304) is distinct.

import mmap import os  def carve_zips_apks(dump_path, output_dir='carved_zips_apks'):     if not os.path.exists(output_dir):         os.makedirs(output_dir)      zip_header = b'PKx03x04' # Local file header signature     archive_count = 0      with open(dump_path, 'rb') as f:         with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:             offset = 0             while True:                 start_index = mm.find(zip_header, offset)                 if start_index == -1:                     break                  # Determining the exact end of a ZIP file is complex without full parsing.                 # ZIP files have a Central Directory End (CDE) record at the end,                 # which starts with 'PKx05x06'.                 # For simple carving, we might extract a large chunk or rely on tools like 'binwalk'.                 # For a more robust approach, one would need to parse the local file headers                 # to determine file sizes and offsets within the ZIP archive.                 # Here, we'll demonstrate a basic carving, perhaps extracting a fixed size                 # or searching for the CDE marker.                 cde_header = b'PKx05x06'                 end_index = mm.find(cde_header, start_index + 4) # Search for CDE after start                 if end_index != -1:                     # Read the size of the central directory and comment length from CDE                     # to accurately calculate the end of the ZIP file.                     # For now, let's just use the CDE's start + a fixed length for the CDE record.                     # CDE is 22 bytes long plus comment length.                     potential_end = end_index + 22 + int.from_bytes(mm[end_index+20:end_index+22], 'little')                 else:                     # If no CDE is found, attempt to carve a large fixed chunk                     potential_end = start_index + (50 * 1024 * 1024) # 50MB heuristic for a large APK                 potential_end = min(potential_end, len(mm))                  output_filename = os.path.join(output_dir, f'carved_archive_{archive_count:04d}_{start_index}.zip')                 with open(output_filename, 'wb') as out_f:                     out_f.write(mm[start_index:potential_end])                 print(f"Carved ZIP/APK from 0x{start_index:x} to 0x{potential_end:x} to {output_filename}")                 archive_count += 1                 offset = potential_end  # Continue searching after the current archive  # Example usage: carve_zips_apks('your_emmc_dump.bin')

Post-Carving Analysis and Refinements

Once data is carved, further analysis is required:

Database Analysis: Use `DB Browser for SQLite` to view carved SQLite files.
Image Viewers: Standard image viewers for JPEGs, PNGs, etc.
File Type Verification: Tools like `file` command-line utility or `hachoir-parser` in Python can verify carved files, as header/footer matches don’t guarantee file integrity.
String Extraction: Use `strings` command or Python for extracting ASCII/Unicode strings from the dump, which can reveal valuable plain text data, URLs, or embedded messages.

strings -e l your_emmc_dump.bin > unicode_strings.txt # Extract Unicode strings strings your_emmc_dump.bin > ascii_strings.txt # Extract ASCII strings

For dealing with fragmentation, more advanced carving techniques might involve entropy analysis, partial header matching, or reassembling fragments based on metadata if available. Encryption remains the toughest challenge; however, even encrypted files might have unencrypted headers, footers, or associated metadata that can be carved.

Conclusion

Analyzing raw eMMC dumps from Android devices requires a deep understanding of storage technologies, filesystem structures, and data formats. While commercial tools provide a baseline, developing custom scripts in Python allows forensic investigators to perform highly targeted data carving and artifact extraction, especially when confronted with damaged filesystems or proprietary data structures. By combining knowledge of file signatures with robust scripting, it’s possible to recover critical evidence that might otherwise remain hidden within the vast binary landscape of a raw eMMC dump. The continuous evolution of Android filesystems and encryption methods necessitates ongoing research and development of these specialized forensic techniques.

Android Mobile Specs & Compare Directory

Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!

Compare Devices Specs →