Android Software Reverse Engineering & Decompilation

Advanced Ghidra Sleigh: Handling Variable-Length Instructions and Conditional Execution in Android Firmware

Google AdSense Native Placement - Horizontal Top-Post banner

Introduction to Advanced Sleigh for Android Firmware Analysis

Reverse engineering Android firmware often goes beyond standard ARM or AArch64 architectures. Modern Android devices increasingly integrate custom co-processors, DSPs, or specialized microcontrollers for tasks like power management, sensor fusion, or security. These components frequently employ proprietary instruction sets, making standard disassemblers ineffective. Ghidra’s Sleigh language provides a powerful, flexible framework to define new processor modules, enabling detailed analysis of these custom architectures. This article delves into advanced Sleigh techniques, specifically focusing on handling variable-length instructions (VLI) and modeling conditional execution, critical aspects for accurate decompilation in challenging Android firmware scenarios.

Why Custom Sleigh Modules for Android?

While Ghidra offers excellent support for common architectures, the fragmented nature of the Android ecosystem means encountering less common or custom instruction sets is a frequent reality for security researchers and firmware analysts. Examples include:

  • Proprietary DSPs: Used for audio processing, image signal processing (ISP), or AI acceleration.
  • Embedded Microcontrollers: Managing peripheral I/O, power states, or dedicated security functions.
  • Obscure Instruction Set Extensions: Vendor-specific additions to standard architectures.

Without a proper Sleigh specification, Ghidra will struggle to disassemble code correctly, leading to garbled listings and unusable decompilation. Mastering Sleigh empowers you to build the necessary tools for deep analysis.

Tackling Variable-Length Instructions (VLI)

Variable-length instruction sets, like ARM’s Thumb-2 or many DSP architectures, present a significant challenge for disassemblers. Instructions can vary in bit-length (e.g., 16-bit, 32-bit, or even longer), and their length often depends on specific opcode bits or prefixes. Sleigh handles this through careful definition of tokens and instruction patterns.

Defining Instruction Length in Sleigh

In Sleigh, the length of an instruction is implicitly determined by the sum of its constituent tokens within a constructor. However, for true VLI, you often need to explicitly manage length within the pattern matching.

define token OP_CODE(8) val;define token ADDR_REG(4) val;define token DATA_VAL(8) val;{ instr.length = 1; }  @ This is the default length in bytes for this pattern.{ instr.length = 2; }  @ This instruction is 2 bytes long.

Consider a hypothetical 8-bit architecture where some instructions are 8-bit and others are 16-bit, identified by a prefix bit.

@ Define primary instruction bytes and context@ This example assumes 8-bit 'units' as a baselinedefine token BYTE0(8) [0,7];define token BYTE1(8) [0,7];@ Context register to indicate 16-bit mode (if applicable)define context CONTEXT_REG is [0,0]init { CONTEXT_REG = 0; }@ Constructors for different instruction lengths:@ 8-bit instruction: MOVE R, #Imm: 0b0000_0000 RRRR IIII (1 byte)define pcodeop custom_move_imm;define instruction [0,0] & BYTE0 = #00_00xxxx (R_out & 0xf) & (IMM_VAL & 0xf) {  export *;  custom_move_imm(R_out, IMM_VAL);  instr_next = inst_next + 1;  length = 1;  print 'MOVE r', (R_out & 0xf) , ', #', (IMM_VAL & 0xf) ;}@ 16-bit instruction: LOAD R, [ADDR]: 0b0001_xxxx_xxxxxxxx (2 bytes)define pcodeop custom_load_addr;define instruction [0,0] & BYTE0 = #00_01xxxx (R_out & 0xf) & BYTE1 (ADDR_VAL & 0xff) {  export *;  custom_load_addr(R_out, ADDR_VAL);  instr_next = inst_next + 2;  length = 2;  print 'LOAD r', (R_out & 0xf) , ', [0x', (ADDR_VAL & 0xff) , ']' ;}

In the example above, `length = 1;` and `length = 2;` explicitly set the instruction length. Ghidra’s disassembler uses this to advance the program counter correctly. Without `instr_next = inst_next + N;` or `length = N;`, Ghidra might default to the smallest token size, leading to incorrect disassembly.

Context for VLI

Sometimes, instruction length depends on a CPU state (e.g., ARM/Thumb mode). Sleigh’s context registers are crucial here. You can define a context register that stores the current mode and use it to select different instruction patterns.

@ Example of using a context register for mode-dependent instruction setsdefine context mode is [0,0]init { mode = 0; }@ If mode = 0, expect 1-byte instructiondefine instruction [mode=0] & BYTE0 = #00_xxxxxx (OP_VAL & 0x3f) {  length = 1;  @ ... handle 1-byte instruction ...}define instruction [mode=1] & BYTE0 = #00_xxxxxx (OP_VAL_HI & 0x3f) & BYTE1 (OP_VAL_LO & 0xff) {  length = 2;  @ ... handle 2-byte instruction ...}

The `[mode=0]` and `[mode=1]` syntax selectively applies constructors based on the `mode` context register’s value.

Modeling Conditional Execution

Conditional execution, where an instruction’s effect depends on the state of CPU flags (e.g., Zero, Carry, Negative, Overflow flags), is a cornerstone of modern processor design. Ghidra’s p-code, combined with Sleigh’s context, allows for precise modeling of these behaviors.

Conditional P-Code Generation

The `build` section of a Sleigh constructor generates p-code. Conditional logic can be implemented using the `if (…) goto …` or by directly setting p-code operations that are conditional (e.g., `CBRANCH`, `COPY` with conditional values). Ghidra’s intermediate language is crucial here.

@ Define CPU flags (context registers)define context CARRY is [0,0]init { CARRY = 0; }define context ZERO is [1,1]init { ZERO = 0; }@ Conditional branch instruction: BEQ ADDR (Branch if Equal/Zero)@ Assuming: 0b0010_RRRR_RRRR RRRR_RRRRdefine pcodeop custom_branch_if_zero;define instruction [0,0] & BYTE0 = #00_10xxxx (TARGET_ADDR_HI & 0x3f) & BYTE1 (TARGET_ADDR_LO & 0xff) {  export *;  TARGET_ADDR = (TARGET_ADDR_HI & 0x3f) << 8 | (TARGET_ADDR_LO & 0xff);  print 'BEQ 0x', TARGET_ADDR;  if (ZERO == 1) goto TARGET_ADDR;  @ Alternatively, use a p-code op with explicit conditional semantics  @ custom_branch_if_zero(TARGET_ADDR, ZERO);  instr_next = inst_next + 2;  length = 2;}

In this simplified example, the `if (ZERO == 1) goto TARGET_ADDR;` directly translates into a conditional jump in Ghidra’s p-code. Ghidra’s decompiler will then analyze this p-code and reconstruct high-level `if` statements or loops.

Conditional Move Example

Many architectures feature conditional move instructions, where a register is loaded only if a certain condition is met.

@ Conditional move instruction: CMOVZ R_dest, R_src (Move if Zero flag is set)@ Assuming: 0b0011_DDDD_SSSSdefine instruction [0,0] & BYTE0 = #00_11xxxx (R_dest & 0xf) & (R_src & 0xf) {  export *;  print 'CMOVZ r', (R_dest & 0xf) , ', r', (R_src & 0xf) ;  if (ZERO == 1) {    R_dest = R_src;  }  instr_next = inst_next + 1;  length = 1;}

The `if (ZERO == 1) { R_dest = R_src; }` within the Sleigh build section correctly models this behavior, allowing the decompiler to generate equivalent high-level code.

Developing a Custom Android Processor Module in Ghidra

The process of integrating your Sleigh specification into Ghidra involves several steps:

  1. Create a Ghidra Processor Module Project:

    In Ghidra, go to File -> New Project -> Ghidra Module Project. Choose a descriptive name (e.g., `AndroidCustomDSP`). This creates a directory structure for your Sleigh files.

  2. Write your Sleigh Files (.sinc, .slaspec, .pspec):

    • processor.sinc: Contains the core Sleigh instruction definitions.
    • processor.slaspec: Defines memory spaces, registers, and context registers.
    • processor.pspec: Processor specification file that ties everything together.
  3. Compile the Sleigh Specification:

    Navigate to your project directory. Ghidra comes with a Sleigh compiler. You can typically compile your specification using a command like:

    sleighspec -a <processor_name> -L <endianness> <path_to_slaspec>

    This generates a `.sla` file (the compiled processor specification).

  4. Install the Module:

    Place the compiled `.sla` file and other necessary files into your Ghidra installation’s Ghidra/Processors/<your_processor_name> directory. Alternatively, you can build your Ghidra module project and install it via File -> Install Extensions.

  5. Load Firmware:

    Open Ghidra, create a new project, and import your Android firmware binary. When prompted to select a language, your custom processor module should now appear in the list. Select it.

  6. Debug and Refine:

    Initial specifications rarely work perfectly. Use Ghidra’s debugger and the Instruction Log to observe p-code generation. Identify incorrect disassembly, miscalculated instruction lengths, or erroneous conditional logic. Iteratively refine your Sleigh files and recompile.

Conclusion

Mastering advanced Sleigh concepts like variable-length instructions and conditional execution is indispensable for reverse engineering the diverse and often custom architectures found within Android firmware. By meticulously defining instruction patterns, managing instruction lengths, and accurately modeling conditional logic through p-code, you can transform opaque binary blobs into understandable, decompilable code. This expert-level capability extends Ghidra’s reach, making it an even more formidable tool in the Android security researcher’s arsenal.

Android Mobile Specs & Compare Directory

Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!

Compare Devices Specs →
Google AdSense Inline Placement - Content Footer banner