The silent eBPF revolution is well underway. Extended Berkeley Packet Filter (eBPF) is used across the cloud-native world to enable faster and more customizable computing. eBPF is a virtual machine within the Linux kernel that allows for extending the kernel’s functionality safely and maintainably. As more logic moves into the kernel, ensuring systems stay performant is crucial.
Profiling eBPF Code
Profiling eBPF code helps developers identify areas needing performance optimizations. Different profiling techniques highlight various areas of interest, helping pinpoint the root cause of performance problems.
Getting Started with eBPF
eBPF allows you to extend the kernel’s functionality without developing a kernel module. It ensures safety by verifying code at load time. eBPF bytecode is loaded into the eBPF virtual machine and executed within the kernel to perform tasks like tracing syscalls, probing user or kernel space, capturing perf events, instrumenting Linux Security Modules (LSM), and filtering packets.
Building an eBPF Profiler
We will create a basic eBPF sampling profiler in Rust using Aya. This profiler will periodically get a snapshot of the stack of a target application.
Setting Up the Development Environment
First, set up your Aya development environment and create a new project called profiler
.
// In eBPF, we can’t use the Rust standard library.
#![no_std]
// The kernel calls our `perf_event`, so there is no `main` function.
#![no_main]
use aya_ebpf::{
helpers::gen::{bpf_get_stack, bpf_ktime_get_ns},
macros::{map, perf_event},
maps::ring_buf::RingBuf,
programs::PerfEventContext,
EbpfContext,
};
use profiler_common::{Sample, SampleHeader};
// Create a global variable that will be set by user space.
#[no_mangle]
static PID: u32 = 0;
// Use the Aya `map` procedural macro to create a ring buffer eBPF map.
#[map]
static SAMPLES: RingBuf = RingBuf::with_byte_size(4_096 * 4_096, 0);
#[perf_event]
pub fn perf_profiler(ctx: PerfEventContext) -> u32 {
let Some(mut sample) = SAMPLES.reserve::(0) else {
aya_log_ebpf::error!(&ctx, "Failed to reserve sample.");
return 0;
};
unsafe {
let stack_len = bpf_get_stack(
ctx.as_ptr(),
sample.as_mut_ptr().byte_add(SampleHeader::SIZE) as *mut core::ffi::c_void,
Sample::STACK_SIZE as u32,
aya_ebpf::bindings::BPF_F_USER_STACK as u64,
);
let Ok(stack_len) = u64::try_from(stack_len) else {
aya_log_ebpf::error!(&ctx, "Failed to get stack.");
sample.discard(aya_ebpf::bindings::BPF_RB_NO_WAKEUP as u64);
return 0;
};
core::ptr::write_unaligned(
sample.as_mut_ptr() as *mut SampleHeader,
SampleHeader {
ktime: bpf_ktime_get_ns(),
pid: ctx.tgid(),
tid: ctx.pid(),
stack_len,
},
)
}
sample.submit(0);
0
}
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
unsafe { core::hint::unreachable_unchecked() }
}
Loading eBPF Code into the Kernel
Next, set up user-space code to load the eBPF program into the kernel.
use aya::{include_bytes_aligned, maps::ring_buf::RingBuf, programs::perf_event, BpfLoader};
#[tokio::main]
async fn main() -> Result {
env_logger::init();
let pid: u32 = std::env::args().last().unwrap().parse()?;
#[cfg(debug_assertions)]
let mut bpf = BpfLoader::new()
.set_global("PID", &pid, true)
.load(include_bytes_aligned!(
"../../target/bpfel-unknown-none/debug/profiler"
))?;
#[cfg(not(debug_assertions))]
let mut bpf = BpfLoader::new()
.set_global("PID", &pid, true)
.load(include_bytes_aligned!(
"../../target/bpfel-unknown-none/release/profiler"
))?;
aya_log::BpfLogger::init(&mut bpf)?;
let program: &mut perf_event::PerfEvent =
bpf.program_mut("perf_profiler").unwrap().try_into()?;
program.load()?;
program.attach(
perf_event::PerfTypeId::Software,
perf_event::perf_sw_ids::PERF_COUNT_SW_CPU_CLOCK as u64,
perf_event::PerfEventScope::OneProcessAnyCpu { pid },
perf_event::SamplePolicy::Frequency(100),
true,
)?;
tokio::spawn(async move {
let samples = RingBuf::try_from(bpf.take_map("SAMPLES").unwrap()).unwrap();
let mut poll = tokio::io::unix::AsyncFd::new(samples).unwrap();
loop {
let mut guard = poll.readable_mut().await.unwrap();
let ring_buf = guard.get_inner_mut();
while let Some(sample) = ring_buf.next() {
log::info!("{sample:?}");
}
guard.clear_ready();
}
});
tokio::signal::ctrl_c().await?;
Ok(())
}
Profiling the Profiler
Users of our profiler report sluggishness. Let’s use sampling and instrumenting profilers to pinpoint the issue.
Sampling Profiler
Install flamegraph
to visualize the stack traces and use perf
to sample the profiler. Generate a flame graph to identify the bottleneck.
Instrumenting Profiler
Use dhat-rs
to measure heap allocations.
#[cfg(feature = "dhat-heap")]
#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;
#[tokio::main]
async fn main() -> Result {
#[cfg(feature = "dhat-heap")]
let _profiler = dhat::Profiler::new_heap();
...
}
Run the profiler with --features dhat-heap
and analyze the results.
Benchmarking the Profiler
Use Criterion
to benchmark the process_sample
function
pub fn process_sample(sample: profiler_common::Sample) -> Result {
// Don't look at me!
let _oops = Box::new(std::thread::sleep(std::time::Duration::from_millis(
u64::from(chrono::Utc::now().timestamp_subsec_millis()),
)));
log::info!("{sample:?}");
Ok(())
}
Add benchmarks using Criterion.
fn bench_process_sample(c: &mut criterion::Criterion) {
c.bench_function("process_sample", |b| {
b.iter(|| {
profiler::process_sample(profiler_common::Sample::default()).unwrap();
})
});
}
criterion::criterion_main!(benchmark_profiler);
criterion::criterion_group!(benchmark_profiler, bench_process_sample);
Run the benchmarks with cargo bench
.
Continuous Benchmarking
Implement continuous benchmarking using Bencher to catch performance regressions in CI.
bencher run \
--project simple-profiler \
--token $BENCHER_API_TOKEN \
cargo bench
Track and compare results over time and across different dimensions.
Conclusion
eBPF allows adding custom capabilities to the Linux kernel. Using Rust and Aya, we built a simple profiler, identified performance regressions using sampling and instrumenting profilers, and verified optimizations with benchmarks. Continuous benchmarking ensures performance regressions are caught before merging changes.
By following these steps, you can ensure your eBPF programs remain performant and maintainable.
All the source code for this guide is available on GitHub.