Beyond Symbol Tables: Recovering Meaning from Stripped Android Binaries for Forensic Evidence

Introduction: The Forensic Challenge of Stripped Binaries

In the realm of Android mobile forensics, investigators frequently encounter native shared libraries (typically .so files) that have been stripped of their symbol tables. This practice, common in release builds for size optimization and obfuscation, transforms a relatively clear binary into a formidable analytical challenge. Without symbolic information – function names, global variable names – understanding the binary’s intent and recovering potential forensic evidence becomes exponentially more difficult. This article delves into advanced techniques for de-obfuscating and recovering meaning from these stripped Android binaries, providing a roadmap for forensic analysts to extract crucial intelligence.

Understanding Android Native Binaries and Stripping

Android applications can leverage the Native Development Kit (NDK) to compile C/C++ code into native shared libraries. These libraries often handle performance-critical tasks, interact with low-level system APIs, or implement security-sensitive logic. For deployment, developers typically use tools like `strip` to remove debugging symbols and relocation information, significantly reducing binary size and making reverse engineering harder. While `strip` enhances security through obscurity, it severely hinders forensic analysis by obscuring the original program logic.

Initial Triage: Gathering Basic Information

Even without symbols, fundamental tools can provide initial insights. The `file` command helps identify the architecture (ARM, ARM64, x86) and ELF type. `readelf` can reveal segments, sections, dynamic entries, and imported/exported library functions, even if specific application symbols are gone.

$ file libnative-lib.so
libnative-lib.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, BuildID[sha1]=..., stripped

$ readelf -s libnative-lib.so | grep UND
... (Many UND (undefined) entries for libc functions like strlen, malloc, etc.)
... (No defined symbols specific to the application)

The `strings` utility can sometimes uncover hardcoded strings, API keys, URLs, or error messages that hint at the binary’s functionality.

Static Analysis: Reconstructing Logic from Assembly

Static analysis forms the bedrock of de-obfuscation. Tools like Ghidra, IDA Pro, and Binary Ninja are indispensable for disassembling and de-compiling stripped binaries.

Function Identification and Boundary Recovery

Without symbol tables, identifying function boundaries is critical. Disassemblers use heuristics like call instructions (`bl`, `call`), stack frame setup/teardown (e.g., `stp x29, x30, [sp, #-0xYY]!`), and indirect jumps. Common entry points for Android native libraries include `JNI_OnLoad` (called when the library is loaded by the JVM) and JNI-exported functions following the `Java_package_class_method` naming convention.

Parameter and Return Type Recovery

This is where expert knowledge shines. Analyzing stack frame manipulation (for arguments passed on the stack) and register usage (for arguments passed in registers, common in ARM calling conventions) can help deduce function signatures. The ARM EABI (Embedded Application Binary Interface) specifies that the first four integer arguments are passed in registers `x0-x3` (AArch64) or `r0-r3` (ARM32), and subsequent arguments on the stack. Return values are typically in `x0`/`r0`. By observing how a function uses these registers and modifies the stack, one can infer argument count and types.

Cross-Referencing (XREFs) and Call Graph Reconstruction

XREFs are vital. By identifying where a function is called from, and what data it accesses, analysts can build a call graph. Even without symbols, a pattern of calls to `strlen`, `malloc`, `memcpy`, and then to an unknown internal function, can suggest the unknown function is processing strings or memory. This pattern recognition helps assign provisional names (e.g., `sub_1234_handleString`).

Leveraging JNI Patterns

JNI functions in stripped binaries often follow predictable patterns. A function that takes a `JNIEnv*` and `jobject` as its first two arguments (typically in `x0`, `x1` on AArch64) and then makes calls to JNIEnv functions (e.g., `->FindClass`, `->GetMethodID`, `->CallObjectMethod`) is almost certainly a JNI native method. The Java code invoking this native method can provide valuable context.

// Ghidra decompiler output might look like this for a JNI function
long Java_com_example_app_Native_doSomething(JNIEnv *param_1,undefined8 param_2) {
  jclass local_40; // Likely a class reference
  jmethodID local_38; // Method ID
  jstring local_30; // String object
  long local_28; // Return value
  
  local_40 = (*(JNIEnv **)&param_1)->FindClass(param_1,"com/example/app/SomeClass");
  local_38 = (*(JNIEnv **)&param_1)->GetMethodID(param_1,local_40,"getData","(Ljava/lang/String;)Ljava/lang/String;");
  // ... further JNI calls
  return local_28;
}

Dynamic Analysis: Observing Runtime Behavior with Frida

Static analysis is powerful, but dynamic analysis provides runtime context, validating hypotheses and revealing execution flow. Frida, a dynamic instrumentation toolkit, is exceptionally useful for stripped binaries.

Hooking Arbitrary Addresses

Frida allows hooking any memory address within a running process. By attaching to a suspect address (identified via static analysis as a potential function entry point), an analyst can observe its arguments and return values.

// Frida script to hook a specific address and log arguments
Java.perform(function() {
    var moduleName = "libnative-lib.so"; // The target shared library
    var baseAddress = Module.findBaseAddress(moduleName);
    if (baseAddress) {
        console.log("[*] Base address of " + moduleName + ": " + baseAddress);
        
        // Example: Hooking an offset found during static analysis
        // Replace 0x1234 with the actual offset of interest
        var targetAddress = baseAddress.add(0x1234);
        console.log("[*] Hooking address: " + targetAddress);

        Interceptor.attach(targetAddress, {
            onEnter: function(args) {
                console.log("[+] Function called at " + targetAddress);
                // Depending on architecture (AArch64), arguments are in x0, x1, x2, x3
                console.log("  [arg0]: " + args[0]);
                console.log("  [arg1]: " + args[1]);
                console.log("  [arg2]: " + args[2]);
                console.log("  [arg3]: " + args[3]);
                // You might need to read values from these pointers if they are string/object pointers
                // For example, if arg0 is a JNIEnv*, you can read its functions:
                // console.log("  JNIEnv*: " + args[0].readPointer());
            },
            onLeave: function(retval) {
                console.log("[-] Function returned: " + retval);
            }
        });
    } else {
        console.log("[-] Module " + moduleName + " not found.");
    }
});

By running the application and interacting with features that might trigger the hooked function, the analyst can observe the types and values of data being processed, directly inferring the function’s purpose.

Tracing System Calls

While `strace` or `ltrace` can provide system call information, Frida offers more granular control, allowing tracing of internal application logic. Observing read/write operations, memory allocations, or network activity can provide vital forensic clues about data exfiltration, encryption, or local storage mechanisms.

Leveraging Android-Specific Knowledge

Context is king. Android applications operate within a specific ecosystem, and this knowledge is crucial for stripped binary analysis:

`AndroidManifest.xml`: This file often reveals components that interact with native code (e.g., “), permissions, and activities, offering clues to the native library’s role.
Associated Java/Kotlin Code: If the application’s bytecode (DEX files) is available, analyzing the Java code that calls native methods can provide the exact function names (e.g., `System.loadLibrary(“native-lib”)` and `native String doSomething(String arg)`) and their signatures, which can then be matched to patterns in the stripped native binary.
Common Android Libraries: Recognizing calls to `libc.so`, `liblog.so`, or other Android framework libraries helps distinguish between OS-level interactions and custom application logic.

Challenges and Limitations

Recovering meaning from stripped binaries is labor-intensive and error-prone. Misinterpreting calling conventions, incorrect argument type deductions, or complex obfuscation techniques (e.g., control flow flattening, virtual opaque predicates) can lead to false positives or missed evidence. It often requires iterative refinement between static and dynamic analysis, correlating observations to build a coherent understanding.

Conclusion

Stripped Android binaries present a significant hurdle for forensic investigations, but they are not insurmountable. By meticulously combining static analysis (disassembly, function signature recovery, JNI pattern matching) with dynamic instrumentation (Frida for runtime observation) and leveraging Android-specific contextual knowledge, forensic analysts can effectively de-obfuscate and reconstruct the original intent of native code. This expert-level approach transforms seemingly meaningless machine code into actionable intelligence, providing crucial evidence for attribution, malware analysis, and incident response.

Android Mobile Specs & Compare Directory

Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!

Compare Devices Specs →