Reverse Engineering Android Malware: A Case Study on ARM64 Native Payloads

Introduction: The Growing Threat of Native Android Malware

The Android ecosystem has long been a prime target for malware developers. While Java/Kotlin-based payloads remain prevalent, there’s a significant rise in sophisticated malware utilizing native code (C/C++) compiled for ARM64 architectures. Native code offers several advantages to attackers: improved performance, closer interaction with the operating system, and, critically, enhanced obfuscation and anti-analysis capabilities. Reverse engineering ARM64 native payloads presents unique challenges, requiring a deep understanding of the ARM64 instruction set, calling conventions, and common development patterns. This article will serve as a detailed guide and a case study, walking you through the process of analyzing such a payload.

Understanding Android Native Code and JNI

Android applications can integrate C/C++ code through the Java Native Interface (JNI). This allows Java code to call native functions and vice-versa. When an Android application uses native code, it typically ships with .so (shared object) libraries located in the lib/ directory of the APK, specifically in subdirectories like arm64-v8a/ for ARM64 architectures. The primary entry point for native code loaded dynamically by Java is often the JNI_OnLoad function.

public class MainActivity extends AppCompatActivity {    static {        System.loadLibrary("malwarelib"); // Loads libmalwarelib.so    }    // ... native method declarations ...}

Upon calling System.loadLibrary(), the Android runtime attempts to locate and load the specified native library. If the library exports a JNI_OnLoad function, this function will be executed immediately after the library is loaded. This makes JNI_OnLoad a critical point for malware authors to perform initial setup, decryption, or anti-analysis checks.

Setting Up Your Reverse Engineering Environment

Effective ARM64 native code analysis requires a robust set of tools:

Disassembler/Decompiler: IDA Pro or Ghidra (both offer excellent ARM64 support).
Android SDK Tools: For adb and other utilities.
APK Analysis Tools: apktool for unpacking APKs.
Emulator/Rooted Device: For dynamic analysis (e.g., Android Studio Emulator, Genymotion, or a physical rooted device).
Frida/Xposed (Optional): For dynamic instrumentation.

Initial Triage: Extracting the Native Payload

First, extract the APK content to locate the native libraries:

apktool d malware.apk -o malware_unpackedcd malware_unpacked/lib/arm64-v8a/ls

You’ll typically find lib[something].so files here. Identify the suspicious ones, often named generically or matching a library loaded by System.loadLibrary() in the Java code.

Static Analysis: Deconstructing ARM64 Assembly

Load the identified .so file into your disassembler (e.g., Ghidra or IDA Pro). The first point of interest is the JNI_OnLoad function. Its signature is typically:

jint JNI_OnLoad(JavaVM *vm, void *reserved)

Within JNI_OnLoad, malware often performs crucial initialization steps. Let’s analyze a hypothetical scenario where malware decrypts a C2 (Command and Control) URL.

ARM64 Assembly Fundamentals for Malware Analysis

Before diving into the case study, a quick refresher on key ARM64 concepts:

Registers: X0-X30 are 64-bit general-purpose registers (W0-W30 for their 32-bit lower halves). X0-X7 are used for function arguments and return values.
PC-Relative Addressing: ARM64 commonly uses ADRP (Address Page) and ADD (Add Register) to load addresses of global data or strings relative to the Program Counter.
Load/Store Instructions: LDR (Load Register), STR (Store Register) are used to move data between registers and memory.
Branch Instructions: B (unconditional branch), BL (Branch with Link – calls a subroutine and stores return address in X30/LR).

Case Study: Decrypting a C2 URL

Consider a snippet within JNI_OnLoad or a function called by it, responsible for decrypting a C2 URL:

_JNI_OnLoad:    // ... other initializations ...    ADRP X0, #c2_encrypted_string@PAGE // Load page address of encrypted string    ADD  X0, X0, #c2_encrypted_string@PAGEOFF // Add page offset, X0 now holds &c2_encrypted_string    MOV  X1, #0x10                   // Key length / Size of encrypted data into X1    BL   decrypt_data_function       // Call decryption function    STR  X0, [SP, #0x20+var_10]      // Store pointer to decrypted data on stack    // ... further operations with decrypted C2 URL ...

Analysis Steps:

ADRP X0, #c2_encrypted_string@PAGE: This instruction calculates the base address of the 4KB page containing c2_encrypted_string and loads it into X0.
ADD X0, X0, #c2_encrypted_string@PAGEOFF: This adds the specific offset within that page to X0, making X0 now point directly to the start of the c2_encrypted_string data in memory.
MOV X1, #0x10: A constant value, likely representing the size of the encrypted data or a key length, is moved into X1. This suggests the decrypt_data_function takes two arguments: the pointer to the encrypted data (X0) and a size/key parameter (X1).
BL decrypt_data_function: This is a Branch with Link instruction, calling the decrypt_data_function subroutine. Upon return, X30 (Link Register) will hold the address of the instruction immediately following BL.
STR X0, [SP, #0x20+var_10]: Assuming decrypt_data_function returns a pointer to the decrypted string in X0, this instruction stores that pointer onto the stack frame of the current function, making it accessible for later use.

Your disassembler will often show you the actual bytes of c2_encrypted_string. By observing the arguments passed to decrypt_data_function and its return value, you can often deduce the decryption algorithm. Sometimes, the key might be hardcoded as another immediate value or loaded from another data section.

Identifying API Calls for Network Activity and Persistence

After decryption, the malware will typically proceed to communicate with its C2 server or establish persistence. This often involves calls to standard C library functions:

Network: socket, connect, send, recv, write, read.
File System/Persistence: open, write, close, mkdir, chmod, fork, execve.

These calls will appear as BL instructions to entries in the Procedure Linkage Table (PLT), which then resolve to the Global Offset Table (GOT), pointing to the actual function implementations in loaded system libraries (like libc.so). For example:

    // ... after decrypting C2 URL into X19 ...    MOV X0, X19                 // Arg1: C2 URL string    MOV X1, #0x2                // Arg2: connection type (e.g., AF_INET)    BL  __android_log_print     // Or similar debug/logging function    BL  socket@PLT              // Call socket()    MOV X2, X0                  // Move socket descriptor into X2 (for next call)    BL  connect@PLT             // Call connect()    // ...

By tracing the arguments to these functions, you can piece together the malware’s intentions, such as connecting to a specific IP/port or writing malicious data to a file.

Dynamic Analysis: Verifying Hypotheses

While static analysis is powerful, dynamic analysis on a rooted device or emulator can confirm your findings. Tools like Frida allow you to hook JNI functions or even specific native functions to inspect arguments and return values in real-time. For example, to hook decrypt_data_function:

// frida -U -f com.malware.package -l hook.js --no-pauseJava.perform(function() {    var module = Module.findExportByName("libmalwarelib.so", "decrypt_data_function");    if (module) {        Interceptor.attach(module, {            onEnter: function(args) {                console.log("decrypt_data_function called!");                console.log("  Encrypted data pointer: " + args[0]);                console.log("  Size/Key parameter: " + args[1].toInt32());                // Read and dump encrypted data            },            onLeave: function(retval) {                console.log("  Decrypted data pointer: " + retval);                // Read and dump decrypted data (e.g., C2 URL)                console.log("  Decrypted string: " + Memory.readCString(retval));            }        });    } else {        console.log("decrypt_data_function not found.");    }});

This allows you to observe the exact C2 URL after decryption, confirming your static analysis findings and bypassing any complex decryption algorithms without fully reversing them.

Conclusion

Reverse engineering Android malware with ARM64 native payloads demands a methodical approach, combining static and dynamic analysis techniques. A solid grasp of ARM64 assembly, JNI interactions, and common malware patterns is essential. By meticulously analyzing JNI_OnLoad, tracing data flows through PC-relative addressing, and identifying critical API calls, you can uncover the core functionalities of even the most sophisticated native Android threats. As malware evolves, so must our analysis capabilities, making expertise in ARM64 an invaluable skill for any mobile security researcher.

Android Mobile Specs & Compare Directory

Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!

Compare Devices Specs →