Introduction: The Elusive Custom VM
In the evolving landscape of Android malware, sophisticated threat actors increasingly employ custom virtual machines (VMs) to obscure their malicious payloads. These VMs execute a unique, non-standard instruction set, making traditional static analysis and decompilation tools largely ineffective. Reverse engineers are often confronted with a stream of bytecodes that Ghidra’s powerful decompiler, by default, cannot interpret. This article delves into an advanced technique for conquering such obfuscation: leveraging Ghidra’s Sleigh language to define custom processor modules capable of understanding and decompiling these bespoke VM opcodes.
Why Custom VMs?
Custom VMs serve as potent anti-analysis mechanisms. By implementing their own instruction set, register model, and execution flow, malware authors achieve several goals:
- Obfuscation: Standard Android bytecode (DEX) or native ARM/x86 instructions are replaced with an unknown set, rendering off-the-shelf tools useless.
- Evasion: Signature-based detection systems struggle to identify patterns in an entirely new instruction set.
- Complexity: Analyzing a custom VM requires a deep understanding of its architecture, significantly increasing the time and effort for reverse engineers.
The Ghidra Advantage
While formidable, custom VMs are not insurmountable. Ghidra, the open-source software reverse engineering framework, provides a unique and powerful capability through its Processor Specification Language, Sleigh. Sleigh allows analysts to define new CPU architectures, instruction sets, and their semantic operations, effectively teaching Ghidra how to understand any arbitrary machine code.
Identifying Custom VM Opcodes
Before writing any Sleigh code, you must first identify and understand the custom VM’s instruction set.
Initial Reconnaissance and Pattern Recognition
The process often begins with dynamic analysis or meticulous static examination of the malware’s native libraries (e.g., .so files). Look for:
- A large interpreter function: Malware utilizing custom VMs typically has a central function that fetches, decodes, and executes instructions. This often involves a large switch-case statement or a series of conditional jumps based on the current opcode.
- Byte patterns: Observe the byte stream that feeds this interpreter. Are there repetitive structures? Consistent opcode lengths or operand patterns?
- Stack manipulations: Many custom VMs are stack-based. Look for pushes and pops to a custom stack.
Pinpointing the Interpreter Loop
Using Ghidra, focus on the native code (ARM/AArch64) that initializes and executes the custom VM. Trace function calls and data accesses. A common pattern involves:
- Loading VM bytecode into memory.
- Initializing custom VM registers (e.g., program counter, stack pointer).
- Entering a loop that:
- Fetches an opcode from the bytecode stream.
- Decodes the opcode and its operands.
- Executes the corresponding operation.
- Updates the custom VM’s program counter.
Identifying this loop and the dispatch mechanism is crucial as it reveals the individual opcode handlers.
Demystifying Sleigh: Ghidra’s Language for CPU Definition
Sleigh is a powerful description language that allows you to specify a processor’s instruction set architecture. It comprises two main components:
.pspec(Processor Specification): Defines the overall architecture, including memory spaces, registers, calling conventions, and endianness..sleigh(Instruction Set Description): Defines the actual instructions, their opcodes, operands, and their semantic effects on the processor state (registers, memory).
Key Sleigh Components
A .sleigh file typically includes:
- Tokens: Define the bit patterns that make up an instruction.
- Constructors: Map tokens to instruction mnemonics and define how operands are parsed.
- Semantics: Describe the effects of each instruction using P-code operations, Ghidra’s intermediate representation.
Crafting a Custom Processor Module with Sleigh
Let’s consider a hypothetical custom VM with a few simple instructions to illustrate the Sleigh development process.
Step 1: Setting up Your Ghidra Development Environment
You’ll need a Ghidra installation and access to its processor development tools. Create a new directory for your custom processor module (e.g., MyCustomVM/data/languages/MyCustomVM.slaspec).
Step 2: Analyzing the Custom VM Instruction Set
Suppose our hypothetical custom VM is stack-based and has the following instructions, each 1 byte for the opcode, followed by operands:
0x01 [VAL]: PUSH_IMM – Push 4-byte immediate valueVALonto the custom stack.0x02: ADD – Pop two values, add them, push result.0x03: HALT – Stop execution.
We’ll also assume our VM has a program counter (pc_vm) and a stack pointer (sp_vm) within its custom register context.
Step 3: Writing the .pspec and .sleigh Files
First, a simplified MyCustomVM.pspec (placed in MyCustomVM.slaspec folder):
<?xml version="1.0" encoding="UTF-8"?><processor_spec> <description>A hypothetical custom VM processor.</description> <default_memory_image></default_memory_image> <language_description> <processor name="MyCustomVM"/> <compiler name="default"/> <endian name="little"/> <address_size name="32"/> <alignment name="1"/> <instruction_size name="1"/> <register_size name="4"/> <byte_sex name="little"/> <target_section name=".text"/> <memory_model name="flat"/> </language_description> <global_context_table></global_context_table> <register_set> <register name="pc_vm" offset="0" size="4"/> <register name="sp_vm" offset="4" size="4"/> </register_set> <memory_model_specifics> <segment name="ram" space="ram"/> </memory_model_specifics></processor_spec>
Next, the MyCustomVM.sleigh file:
@define BIG_ENDIAN 0@define LITTLE_ENDIAN 1@ifdef _LANGUAGE_LITTLE_ENDIAN@define CURRENT_ENDIAN LITTLE_ENDIAN@else@define CURRENT_ENDIAN BIG_ENDIAN@endifdefine space ram type=ram_space size=4;define space register type=register_space size=4;define register pc_vm as ram:[0x0:0x3];define register sp_vm as ram:[0x4:0x7];define token instruction(1) opcode = (0,0);define token immediate_val(4) value = (0,31);macro push_val(val) { *ram[sp_vm] = val; sp_vm = sp_vm + 4;}macro pop_val() { sp_vm = sp_vm - 4; return *ram[sp_vm];} :PUSH_IMM is opcode=0x01 & immediate_val { local val = immediate_val.value; pc_vm = pc_vm + 5; push_val(val);}:ADD is opcode=0x02 { local val1 = pop_val(); local val2 = pop_val(); pc_vm = pc_vm + 1; push_val(val1 + val2);}:HALT is opcode=0x03 { pc_vm = pc_vm + 1; build(halt); # Ghidra's P-code for halting}
Step 4: Compiling and Integrating the Processor Module
- Place the files: Put
MyCustomVM.pspecandMyCustomVM.sleighintoGhidra/Processors/MyCustomVM/data/languages/(create theMyCustomVManddata/languagesdirectories if they don’t exist). - Compile Sleigh: Navigate to the Ghidra root directory in your terminal and run
support/sleigh MyCustomVM. This will compile the.sleighfile into a.slafile. - Launch Ghidra: Start Ghidra. When importing a new binary, you should now see
Android Mobile Specs & Compare Directory
Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!
Compare Devices Specs →