Mapping Android Malware Execution Flow: Leveraging Ghidra’s Graph View and Data Tracking

Introduction: Unraveling Android Malware with Ghidra

The Android ecosystem, while vast and innovative, remains a prime target for malicious actors. Analyzing Android malware often requires a deep dive into its compiled code to understand its true intent. Static analysis, performed without executing the application, is a critical first step in this process. Ghidra, the open-source software reverse engineering (SRE) suite developed by the NSA, has become an indispensable tool for security researchers. Its powerful decompilation capabilities, coupled with robust visualization features like the Graph View and sophisticated data tracking mechanisms, empower analysts to dissect complex malware execution flows and identify malicious behaviors.

This article provides an expert-level guide on utilizing Ghidra’s static analysis features to map the execution flow of Android malware. We will focus on how to effectively use the Graph View to visualize control flow and how to track data through functions to uncover sensitive information leakage or manipulation.

Setting the Stage: Prerequisites and Initial Setup

Tools You’ll Need

Ghidra: The latest stable release.
Android Application Package (APK): A sample Android malware APK for analysis.
d2j-dex2jar: A tool to convert Dalvik bytecode (DEX) to Java Archive (JAR) format, which Ghidra can more readily process for Java applications. Available as part of the `dex2jar` project.
APKTool: To decompile the APK and extract its `classes.dex` files and `AndroidManifest.xml`.

Importing Your APK into Ghidra

Ghidra primarily works with native binaries or Java JAR/class files. For Android APKs, a common approach involves converting the DEX bytecode into a JAR. Here’s the typical workflow:

Decompile the APK with APKTool: First, extract the `classes.dex` file(s) and the `AndroidManifest.xml` from your target APK. The `AndroidManifest.xml` is crucial for identifying potential entry points.
```
apktool d malicious.apk
```
This command will create a directory named `malicious` containing the decompiled resources and the `classes.dex` file(s) in the root.
Convert DEX to JAR using dex2jar: Navigate to your `dex2jar` directory and convert the `classes.dex` file(s) to a JAR. If there are multiple `classesX.dex` files, convert each of them.
```
./d2j-dex2jar.sh malicious/classes.dex -o malicious.jar
```
This will generate `malicious.jar`.
Import the JAR into Ghidra:
- Launch Ghidra and create a new project.
- Go to `File > Import File…` and select your generated `malicious.jar`.
- Ghidra will recognize it as a Java bytecode file. Accept the default import options.
- After import, double-click the `malicious.jar` entry in the project tree to open it in the Code Browser.
- When prompted, perform auto-analysis. Ensure the “Dalvik VM Analyzer” (if using a Ghidra-Dalvik-Loader plugin) or “Java Analyzer” (for direct JAR import) is selected along with “Decompiler Parameter ID” and “Subroutine Thunk Functions” for optimal analysis.

Navigating the Control Flow: Ghidra’s Graph View

Identifying Entry Points

Before diving into the graph view, it’s essential to identify potential starting points for execution. For Android applications, these are typically defined in the `AndroidManifest.xml`:

`Activity` components (especially those with `android.intent.action.MAIN` and `android.intent.category.LAUNCHER` filters)
`Service` components (`onStartCommand`, `onCreate`)
`BroadcastReceiver` components (`onReceive`)
`ContentProvider` components
The application’s `Application` class (`onCreate`)

Once you identify a class (e.g., `MainActivity`) and a method (e.g., `onCreate`) from the manifest, you can search for it in Ghidra’s Symbol Tree or filter functions in the “Filter Functions” window to locate its entry point.

Visualizing Execution Paths

Ghidra’s Graph View provides a visual representation of a function’s control flow, making complex branching and looping constructs much easier to understand than raw assembly or decompiled code. Each node in the graph represents a basic block (a sequence of instructions with a single entry and exit point), and edges represent possible transitions between blocks.

Locate a target function: In the Listing window or the Function Call Tree, find a function of interest (e.g., `onCreate` or a suspicious utility function).
Display Function Graph: Right-click the function name in the Listing view and select “Display Function Graph.” This will open a new window showing the Control Flow Graph (CFG) of the selected function.
Analyze the graph’s nodes and edges:

Green nodes: Typically represent the entry point of the function.
Red nodes: Often indicate an exit point (return).
Other nodes: Intermediate basic blocks.
Edges: Arrows indicating the flow of execution. Conditional branches will show two outgoing edges.

Navigate and explore: You can zoom, pan, and rearrange nodes for better readability. Clicking on a node in the graph will synchronize the Listing view to that basic block. Pay attention to long paths, numerous conditional branches, or loops, as these can hide complex logic. Look for paths leading to sensitive API calls or data manipulation.
Explore called functions: If a node contains a call to another function, you can often right-click the call instruction and choose to “Display Function Graph” for the called function to dive deeper into its logic, effectively traversing the call graph.

Following the Data: Static Taint Analysis and Cross-References

Understanding Data Flow

Beyond execution flow, understanding how data is created, manipulated, and used is paramount in malware analysis. Malicious applications often collect sensitive data (contacts, SMS, location) and then exfiltrate it. Ghidra helps track the flow of this data statically.

Tracking Sensitive Information

Consider a scenario where you suspect a malware sends SMS messages to a premium number. You might start by searching for known SMS sending APIs in the Java SDK, such as `android.telephony.SmsManager.sendTextMessage`.

Let’s assume the decompiled code shows something like this (simplified):

public void sendSecretSMS(String recipient, String message) {  SmsManager smsManager = SmsManager.getDefault();  smsManager.sendTextMessage(recipient, null, message, null, null);}

To understand what `recipient` and `message` contain, you would follow these steps in Ghidra:

Locate the sensitive API call: Search for `sendTextMessage` in Ghidra’s Symbol Tree or use the “Search > For Strings” function to find relevant string constants related to SMS. Once found in the Listing view, navigate to its call site.
Examine its arguments: In the Listing view, observe the values or registers/variables passed as arguments to `sendTextMessage`. Ghidra’s decompiler will often show pseudo-code, making argument identification straightforward.
Trace argument origins using cross-references (X-Refs): Right-click on a variable or register holding an argument (e.g., `recipient` or `message` in the decompiled view) and select “References” -> “Show References To.” This will list all locations where that variable or register is defined or used.
Follow the data backwards: Navigate through the `X-Refs` window. You will likely see assignments to the variable from other variables, method return values, or hardcoded strings. Keep tracing backwards through these references.

If the data comes from a method call, investigate that method’s implementation.
If it’s a field access, check where that field is initialized or modified.
If it’s a hardcoded string, note its value.
If it’s from user input, identify the UI component (e.g., `EditText`) or input method.

Identify data sinks and sources: Continue tracing until you reach the ultimate source of the data (e.g., `SharedPreferences`, `SQLite` database, network input, or hardcoded values) and understand its path to the sensitive API call (the sink).
Document findings: Use Ghidra’s bookmark feature (`Ctrl+B`) and add comments (`;`) to mark critical data sources, sinks, and points of interest in the data flow. This is crucial for later reporting and understanding.

Advanced Techniques and Best Practices

Scripting for Automation

Ghidra supports scripting in Java and Python. For repetitive tasks, such as finding all calls to a specific API across multiple classes or automating data flow analysis for common patterns, scripting can significantly speed up your analysis. For example, you could write a script to find all occurrences of `SmsManager.sendTextMessage` and print the arguments passed to it.

Annotations and Bookmarks

As you navigate complex code, utilize Ghidra’s annotation features. Bookmarks allow you to quickly return to specific locations. Comments help document your understanding of functions, variables, and code blocks, preventing redundant analysis and aiding collaborative efforts.

Conclusion: Empowering Your Malware Analysis

Ghidra provides an incredibly powerful and flexible platform for static analysis of Android malware. By mastering its Graph View, you gain an intuitive understanding of execution flow. Leveraging cross-references and data tracking techniques allows you to pinpoint exactly how sensitive information is being handled, from its origin to its potential exfiltration. These core capabilities, combined with good analytical practices and the potential for automation, make Ghidra an essential tool in any cybersecurity professional’s arsenal for dissecting the intricate workings of malicious Android applications.

Android Mobile Specs & Compare Directory

Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!

Compare Devices Specs →