Advanced OS Customizations & Bootloaders

Troubleshooting Android Kernel Panics: Automated Ftrace Scripting for Post-Mortem Analysis

Google AdSense Native Placement - Horizontal Top-Post banner

Introduction: The Elusive Android Kernel Panic

Android devices, despite their robustness, can suffer from kernel panics—critical failures that halt the operating system. Diagnosing these low-level issues post-mortem is notoriously challenging. When a panic occurs, the system often reboots, wiping away transient debug information. This article delves into an advanced technique: leveraging Ftrace with automated scripting to capture critical kernel events just before a panic, providing invaluable insights for post-mortem analysis.

Understanding Ftrace: The Linux Kernel Tracer

Ftrace is a powerful tracing utility built directly into the Linux kernel, offering deep visibility into kernel activities. It can trace function calls, schedule events, system calls, and much more, making it an indispensable tool for performance analysis and debugging. For Android, which runs on a Linux kernel, Ftrace provides an unparalleled window into its inner workings.

Key Ftrace Concepts:

  • Trace Buffers: In-memory buffers where trace events are recorded.
  • Tracers: Different mechanisms for tracing (e.g., function, function_graph, sched_switch).
  • Events: Specific kernel events (e.g., sched_switch, irq_handler_entry) that can be enabled for tracing.
  • Debugfs Interface: Ftrace is primarily controlled via files in the debugfs pseudo-filesystem, typically mounted at /sys/kernel/debug/tracing.

The Post-Mortem Debugging Challenge

When an Android kernel panics, the device usually reboots immediately. The volatile trace buffers holding Ftrace data are lost, making it impossible to see what led up to the crash. Traditional methods like pstore can capture kernel logs, but they often lack the granular, time-series event data that Ftrace provides.

Our goal is to proactively capture Ftrace data, flushing it to persistent storage frequently or on trigger, so that even if a panic occurs, we have a recent snapshot of kernel activity.

Pre-requisites for Automated Ftrace Scripting

Before proceeding, ensure you have the following:

  • A rooted Android device.
  • ADB (Android Debug Bridge) installed and configured on your host machine.
  • A kernel built with Ftrace support (most modern Android kernels have this).
  • debugfs mounted. You can check this with mount | grep debugfs. If not mounted, it’s usually at /sys/kernel/debug.
  • Basic understanding of Linux shell scripting.

Setting Up Ftrace for Pre-Panic Data Capture

The core idea is to continuously capture Ftrace data and periodically dump it. We’ll use a function tracer for broad coverage, but specific events can also be enabled.

Step 1: Accessing the Tracing Interface

Connect to your Android device via ADB shell:

adb shell

Navigate to the tracing directory:

cd /sys/kernel/debug/tracing

You might need root privileges:

su

Step 2: Configuring the Tracer

First, clear any previous trace data and disable tracing:

echo 0 > tracing_onecho > traceecho nop > current_tracer

Choose your tracer. For general debugging, function is a good start. For more detailed call graph analysis, function_graph might be useful, but it has higher overhead.

echo function > current_tracer

Set the trace buffer size (e.g., 100MB per CPU). Adjust based on device memory and desired retention:

echo 102400 > buffer_size_kb # 100MB

You can filter functions to trace only specific modules or functions. For broad panics, it’s often better to start wide.

# Example: Trace only functions containing "msm_bus"# echo "*msm_bus*" > set_ftrace_filter

Enable specific events if needed. This is powerful for targeted issues (e.g., scheduling latency, IRQ problems):

# Example: Enable scheduler and IRQ events# echo 1 > events/sched/sched_switch/enable# echo 1 > events/irq/irq_handler_entry/enable# echo 1 > events/irq/irq_handler_exit/enable

Step 3: Enabling Tracing

echo 1 > tracing_on

Automated Ftrace Data Capture Script

To ensure we capture data right before a panic, we need a script that periodically reads the Ftrace buffer and saves it to persistent storage. This script should ideally run in the background on the Android device.

Create a file, e.g., /data/local/tmp/ftrace_watcher.sh, with the following content:

#!/system/bin/shTRACE_DIR="/sys/kernel/debug/tracing"OUTPUT_DIR="/data/local/tmp/ftrace_logs"INTERVAL=10 # Dump trace every 10 secondsMAX_FILES=10 # Keep last 10 trace dumpsmkdir -p $OUTPUT_DIR# Initial Ftrace setup (optional, can be done manually before running script)echo 0 > $TRACE_DIR/tracing_onecho > $TRACE_DIR/traceecho function > $TRACE_DIR/current_tracerecho 102400 > $TRACE_DIR/buffer_size_kb # 100MB per CPUecho 1 > $TRACE_DIR/tracing_onlog_dump() {    TIMESTAMP=$(date +%Y%m%d_%H%M%S)    OUTPUT_FILE="$OUTPUT_DIR/trace_$TIMESTAMP.log"    echo "Dumping trace to $OUTPUT_FILE..."    cat $TRACE_DIR/trace > $OUTPUT_FILE    echo > $TRACE_DIR/trace # Clear buffer after dumping    # Prune old files    OLD_FILES=$(ls -t $OUTPUT_DIR/trace_*.log | tail -n +$(($MAX_FILES + 1)))    for f in $OLD_FILES; do        rm "$f"        echo "Removed old trace file: $f"    done}# Main loop to dump traceswhile true; do    log_dump    sleep $INTERVALdone

Make the script executable and run it:

chmod +x /data/local/tmp/ftrace_watcher.sh/data/local/tmp/ftrace_watcher.sh &

This script will continuously dump the current Ftrace buffer content to a log file in /data/local/tmp/ftrace_logs every 10 seconds, rotating logs to keep only the latest 10 files. When a panic occurs, you can retrieve the last few log files for analysis.

Analyzing the Captured Ftrace Data

After a panic and device reboot, retrieve the log files:

adb pull /data/local/tmp/ftrace_logs .

The trace_*.log files contain raw Ftrace output. While you can read them manually, tools like trace-cmd (available for Linux hosts) make analysis much easier.

# On your host machinetrace-cmd report -i trace_20231027_103000.log

If trace-cmd doesn’t work with the raw cat /sys/kernel/debug/tracing/trace output directly (sometimes it expects trace.dat generated by trace-cmd record), you might need to process the text file. Alternatively, consider using trace-cmd record directly on the device if you have a pre-built trace-cmd binary for Android, which generates a binary trace.dat file.

For text files, look for patterns:

  • Last functions called: These are often crucial. What was the kernel doing just before the system went down?
  • Unexpected loops or high frequency calls: Could indicate a stuck thread or resource contention.
  • Interrupt handling: Anomalies in IRQ entry/exit could point to driver issues.
  • Scheduler activity: Frequent context switches or a lack thereof could signal CPU starvation or a frozen task.

Manual grep and text processing tools (awk, sort, uniq) can also be invaluable for sifting through large trace files.

Advanced Ftrace Techniques (Briefly)

  • Event Tracing: More specific than function tracing. Useful when you suspect a particular subsystem (e.g., block for storage, ext4 for filesystem).
  • ftrace_dump_on_oops / panic kernel parameter: Some kernels can be configured to dump the Ftrace buffer to pstore on panic, but this requires kernel configuration and might not always work reliably for all panics. The scripting approach is more proactive.
  • Filtering: Use set_ftrace_filter and set_ftrace_notrace to focus on specific functions or exclude noisy ones.
  • Stack Tracing: stacktrace tracer or enabling call_stack for function tracing can give deeper context.

Conclusion

Automated Ftrace scripting provides a robust, proactive approach to collecting critical kernel event data leading up to an Android kernel panic. By continuously flushing the Ftrace buffer to persistent storage, developers and advanced users gain invaluable diagnostic information that would otherwise be lost. This technique, combined with careful analysis of the trace logs, significantly enhances the ability to identify the root causes of elusive kernel panics, making the challenging task of post-mortem debugging more manageable and effective.

Android Mobile Specs & Compare Directory

Are you researching mobile hardware properties, processor SoCs, GPU chipsets, or RAM configurations? Access our complete specs catalog to compare up to 5 devices side-by-side!

Compare Devices Specs →
Google AdSense Inline Placement - Content Footer banner