Introduction to ELF and Dynamic Linking in Android
The Executable and Linkable Format (ELF) is the standard binary format for executables, shared libraries, and core dumps on Linux, and by extension, Android. For reverse engineers and security analysts, a deep understanding of ELF is paramount, especially when dealing with native Android libraries (.so files). This article delves into the intricacies of dynamic linking and relocations within Android’s ELF shared objects, providing practical steps to trace how an application resolves external functions and data at runtime. Mastering these concepts is crucial for advanced debugging, vulnerability research, and understanding anti-tampering techniques.
Understanding ELF Structure Basics for Dynamic Linking
An ELF file is composed of various sections and segments. While section headers provide detailed information for linking and debugging, program headers describe how the operating system loads the file into memory at runtime. For dynamic linking, the PT_DYNAMIC program header is of particular interest, as it points to the .dynamic section, which contains essential information for the dynamic linker.
Key Sections for Dynamic Linking
.dynsym: The dynamic symbol table, listing symbols that are either imported from other libraries or exported for use by them..dynstr: The dynamic string table, containing string representations of the symbols in.dynsym..rel.pltor.rela.plt: Relocation entries for the Procedure Linkage Table (PLT)..rel.dynor.rela.dyn: Relocation entries for data in the Global Offset Table (GOT) and other dynamic sections..gotor.got.plt: The Global Offset Table, used for position-independent code to resolve addresses of external data and functions..plt: The Procedure Linkage Table, used to call external functions dynamically.
The Mechanics of Dynamic Linking and Relocations
Dynamic linking allows executables to defer the resolution of external symbols (functions and global variables) until runtime. This saves disk space and memory, as common libraries like libc are loaded once and shared. In Android, the dynamic linker (part of Bionic’s linker or dlopen) handles this process.
The Role of Relocations
Relocations are instructions to the dynamic linker to modify portions of the code or data segments of a shared object to point to the correct runtime addresses. This is essential for Position-Independent Code (PIC), which allows a library to be loaded at any memory address without needing to be recompiled.
When a shared library is loaded, it often refers to symbols (functions or variables) that reside in other libraries. These references are initially placeholders. The relocation entries tell the linker:
- Which memory address needs to be modified (
r_offset). - How it needs to be modified (
r_type, e.g., an absolute address, an offset relative to a base). - Which symbol the modification relates to (
r_sym). - An optional addend (
r_addend).
Common relocation types in ARM64 (AArch64) Android binaries include:
R_AARCH64_JUMP_SLOT: Used for PLT entries, typically for function calls.R_AARCH64_GLOB_DAT: Used for GOT entries, typically for global variable accesses or function pointers.R_AARCH64_RELATIVE: Used to relocate base addresses for PIC.
Procedure Linkage Table (PLT) and Global Offset Table (GOT)
The PLT and GOT work together to enable dynamic function calls. When an external function is called for the first time:
- The call goes to a stub in the PLT.
- The PLT stub jumps to an entry in the GOT.
- Initially, the GOT entry points back to the PLT stub, which then invokes the dynamic linker.
- The dynamic linker resolves the actual address of the external function and writes it into the GOT.
- Subsequent calls to the same function directly jump from the PLT to the resolved address in the GOT, bypassing the linker.
For global data, the GOT entry is directly resolved by the linker upon library loading, or on its first access depending on the relocation type.
Practical Analysis: Tracing with readelf and objdump
Let’s analyze a simple Android native library to see these concepts in action. Consider a hypothetical libnative.so compiled from the following C code:
#include <string.h> // For strlen
#include <stdio.h> // For puts, snprintf
void greet(const char* name) {
char buffer[256];
size_t len = strlen(name);
snprintf(buffer, sizeof(buffer), "Hello, %s! Your name is %zu characters long.", name, len);
puts(buffer);
}
To compile this for Android AArch64, you’d use the NDK toolchain:
aarch64-linux-android-gcc -shared -fPIC -o libnative.so native.c
Step 1: Identify Dynamic Dependencies (`DT_NEEDED`)
First, let’s see which libraries our libnative.so depends on. The -d flag for readelf displays the contents of the .dynamic section:
readelf -d libnative.so | grep DT_NEEDED
Expected Output (may vary slightly based on NDK version):
0x0000000000000001 (NEEDED) Shared library: [libc.so]
0x0000000000000001 (NEEDED) Shared library: [libm.so]
0x0000000000000001 (NEEDED) Shared library: [libdl.so]
This shows that libnative.so requires libc.so (for strlen, snprintf, puts), libm.so, and libdl.so.
Step 2: Inspect Dynamic Symbols (`DT_SYMTAB`)
Next, we’ll examine the dynamic symbol table to find imported (undefined) symbols. The -sD flag for readelf shows dynamic symbols, and we’ll filter for `UND` (Undefined) symbols.
readelf -sD libnative.so | grep UND
Partial Output:
12: 0000000000000000 0 FUNC GLOBAL DEFAULT UND strlen@LIBC (2)
13: 0000000000000000 0 FUNC GLOBAL DEFAULT UND snprintf@LIBC (2)
14: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@LIBC (2)
Here, we see `strlen`, `snprintf`, and `puts` are undefined, meaning they are imported from `libc.so`. The `0000000000000000` address signifies they are placeholders yet to be resolved.
Step 3: Analyze Relocations (`DT_REL`/`DT_RELA`/`DT_JMPREL`)
Now, let’s look at the relocation entries. These tell the linker exactly where to write the resolved addresses. The -r flag for `readelf` displays relocation sections.
readelf -r libnative.so
Look for entries related to `strlen`, `snprintf`, `puts` in the `.rela.plt` section:
Relocation section '.rela.plt' at offset 0x3d8 has 3 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000004000 000c00000017 R_AARCH64_JUMP_SLOT 0000000000000000 strlen@LIBC + 0
0000000000004008 000d00000017 R_AARCH64_JUMP_SLOT 0000000000000000 snprintf@LIBC + 0
0000000000004010 000e00000017 R_AARCH64_JUMP_SLOT 0000000000000000 puts@LIBC + 0
This output is critical. It shows:
- `Offset`: The address within the shared object (specifically in the
.got.pltsection) where the resolved address of the symbol will be written by the linker. - `Type`:
R_AARCH64_JUMP_SLOTindicates this is for a function call via the PLT/GOT mechanism. - `Symbol’s Name`: The function being resolved (e.g., `strlen@LIBC`).
To see the actual contents of the `.got.plt` before and after linking (conceptually), we can use `objdump -R` to inspect the relocation records applied to sections like `.got.plt`.
objdump -R libnative.so
Output showing the addresses being modified:
Dynamic relocations:
RELOCATION RECORDS FOR [.rela.plt]:
OFFSET TYPE VALUE
0000000000004000 R_AARCH64_JUMP_SLOT strlen
0000000000004008 R_AARCH64_JUMP_SLOT snprintf
0000000000004010 R_AARCH64_JUMP_SLOT puts
This confirms that addresses 0x4000, 0x4008, and 0x4010 within the library’s loaded memory will be patched by the linker to point to the actual addresses of `strlen`, `snprintf`, and `puts` in `libc.so`.
Step 4: Tracing a Function Call Through PLT/GOT
When the `greet` function calls `strlen`, the compiled code will look something like this (simplified ARM64 assembly):
; Inside greet function
...
adrp x8, #0x4000 ; x8 points to the base of .got.plt
ldr x9, [x8, #0x0] ; Load the initial value from .got.plt[0] (strlen entry)
blr x9 ; Branch to x9 (initially, this goes to PLT stub)
...
The `ldr x9, [x8, #0x0]` instruction accesses the GOT entry at `0x4000`. The first time, this GOT entry contains the address of the `strlen` PLT stub. The PLT stub then performs a jump to the dynamic linker, which resolves `strlen`’s real address and writes it back to `0x4000` in the GOT. Subsequent calls will directly jump to the real `strlen` function via the updated GOT entry.
Conclusion
Understanding the intricacies of ELF dynamic linking and relocations in Android shared objects is a cornerstone of advanced native reverse engineering. By leveraging tools like `readelf` and `objdump`, analysts can statically uncover dependencies, identify imported symbols, and trace how external functions and data are resolved at runtime. This knowledge is indispensable for security researchers examining native malware, developers optimizing their applications, and anyone seeking a deeper insight into the execution environment of Android native code.
Android Mobile Specs & Compare Directory
Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!
Compare Devices Specs →