Symbolic Execution with KLEE: Automated Test Generation and Bug Finding
Introduction
Introduction
Symbolic execution is a powerful program analysis technique that executes programs with symbolic values instead of concrete inputs. Unlike traditional testing, which explores a single path with specific inputs, symbolic execution systematically explores multiple paths through a program simultaneously, generating inputs that exercise different execution paths.
In this post, we’ll explore KLEE (KLEE Symbolic Virtual Machine), a state-of-the-art symbolic execution engine built on top of the LLVM compiler infrastructure. KLEE has proven remarkably effective at finding bugs and generating high-coverage test suites for real-world software.
What is KLEE?
KLEE is a symbolic virtual machine that can systematically explore program paths by treating inputs as symbolic expressions rather than concrete values. Developed as part of research at Stanford University (and presented in the seminal OSDI 2008 paper), KLEE operates on LLVM bitcode and can:
- Automatically generate test cases that achieve high code coverage
- Find bugs such as assertion violations, null pointer dereferences, and buffer overflows
- Explore paths through complex code with mathematical precision
- Produce concrete inputs that trigger specific program behaviors
The key insight behind KLEE is that instead of running a program with x = 5, it runs with x = symbolic, tracking constraints on x as the program executes. When the program reaches a branch (like if (x > 0)), KLEE forks execution, exploring both paths while recording the constraints: {x > 0} for one path and {x <= 0} for the other.
Getting Started with KLEE
Installation
KLEE can be installed via several methods:
Using Docker (Recommended)
docker pull klee/klee:latest
docker run -it --rm klee/klee:latest
Using Package Managers
For macOS with Homebrew:
brew install klee
For Ubuntu/Debian via Snap:
snap install klee
Building from Source
KLEE is typically built against LLVM 16 (the currently recommended version):
git clone https://github.com/klee/klee.git
cd klee
mkdir build && cd build
cmake .. -DLLVM_CONFIG_BINARY=/path/to/llvm-config
make
Basic Workflow
The KLEE workflow consists of three main steps:
- Compile source to LLVM bitcode
- Run KLEE on the bitcode
- Replay generated test cases
Tutorial: Testing a Simple Function
Let’s walk through KLEE’s canonical example. Consider this simple function:
// get_sign.c
int get_sign(int x) {
if (x == 0)
return 0;
else if (x < 0)
return -1;
else
return 1;
}
To test this with KLEE, we need a driver that marks inputs as symbolic:
// test_get_sign.c
#include "klee/klee.h"
int get_sign(int x);
int main() {
int a;
klee_make_symbolic(&a, sizeof(a), "a");
return get_sign(a);
}
Step 1: Compile to LLVM Bitcode
clang -I /path/to/klee/include -g -O0 -Xclang -disable-O0-optnone \
-emit-llvm -c test_get_sign.c -o test_get_sign.bc
Important compilation notes:
- Use
-emit-llvmto generate LLVM bitcode -gadds debug information for source-level statistics- Avoid
-O0alone in newer LLVM versions (> 5.0); use-O0 -Xclang -disable-O0-optnoneinstead - Don’t optimize the bitcode—KLEE has its own optimization pipeline (
--optimizeflag)
Step 2: Run KLEE
klee test_get_sign.bc
KLEE outputs:
KLEE: output directory is "klee-out-0"
KLEE: done: total instructions = 42
KLEE: done: completed paths = 3
KLEE: done: generated tests = 3
This tells us KLEE explored all three paths through get_sign() (x = 0, x < 0, x > 0) and generated three test cases.
Step 3: Examine Generated Test Cases
Test cases are stored in klee-out-0/ as .ktest files:
ktest-tool klee-out-0/test000001.ktest
Output:
ktest file : 'klee-out-0/test000001.ktest'
args : ['test_get_sign.bc']
num objects: 1
object 0: name: 'a'
data (int): 0
(hex): 0x00000000
The three test cases contain:
a = 0(path where x == 0)a = 16843009(some positive value)a = -2147483648(a negative value)
Step 4: Replay Test Cases
KLEE provides a replay library to run test cases on native code:
clang test_get_sign.c -lkleeRuntest -o test_get_sign_native
export KTEST_FILE=klee-out-0/test000001.ktest
./test_get_sign_native
echo "Exit code: $?"
Key KLEE Intrinsics
KLEE provides special intrinsic functions declared in klee/klee.h:
klee_make_symbolic()
klee_make_symbolic(&var, sizeof(var), "var_name");
Marks a memory location as symbolic. KLEE will treat var as an unconstrained symbolic value and explore paths based on its use.
klee_assume()
klee_assume(condition);
Adds constraints to the current path. KLEE will only explore states where condition is true. Conceptually equivalent to wrapping remaining code in if(condition) {}.
Warning with short-circuit operators:
// This may create unexpected extra paths due to compiler behavior
klee_assume(a > 0 && b < 10);
// Better: use bitwise operators or separate calls
klee_assume(a > 0);
klee_assume(b < 10);
klee_prefer_cex()
klee_prefer_cex(&var, condition);
Tells KLEE to prefer certain concrete values when generating test cases. Useful for readability:
char input[4];
klee_make_symbolic(input, sizeof(input), "input");
for (int i = 0; i < 4; i++)
klee_prefer_cex(&input[i], 32 <= input[i] && input[i] <= 126);
This prefers printable ASCII characters in test outputs.
Symbolic Environment Options
KLEE’s POSIX runtime (-posix-runtime) provides options for symbolic environments:
Symbolic Arguments
klee -posix-runtime program.bc -sym-arg 10 # One symbolic arg of length 10
klee -posix-runtime program.bc -sym-args 1 3 5 # 1-3 args, each max length 5
Symbolic Files
klee -posix-runtime program.bc -sym-files 2 100 -sym-stdin 50
This creates:
- 2 symbolic files (‘A’, ‘B’) each of size 100 bytes
- Symbolic stdin of size 50 bytes
Example: Password Checker
// password.c
#include <string.h>
#include <stdio.h>
int main(int argc, char **argv) {
if (argc < 2) return 1;
if (strcmp(argv[1], "secret123") == 0) {
printf("Access granted!\n");
return 0;
} else {
printf("Access denied!\n");
return 1;
}
}
Compile and run with symbolic arguments:
clang -emit-llvm -c password.c -o password.bc
klee -posix-runtime password.bc -sym-arg 10
KLEE will explore paths and discover the password “secret123” in one of the test cases!
Search Heuristics
KLEE provides several search strategies for exploring paths:
| Heuristic | Description |
|---|---|
| Depth-First Search (DFS) | Traverses states depth-first |
| Random State Search | Randomly selects a state |
| Random Path Selection | Probabilistic path selection (from OSDI’08 paper) |
| NURS (Non-Uniform Random Search) | Selects based on distributions (query cost, distance to uncovered code, etc.) |
Interleave heuristics with multiple --search options:
klee --search=random-path --search=nurs:query program.bc
Practical Applications
Testing GNU Coreutils
KLEE has been used to test GNU Coreutils, finding numerous bugs. The process involves:
- Compiling Coreutils with
wllvm(whole-program LLVM) - Extracting bitcode for specific utilities
- Running KLEE with appropriate symbolic environment options
- Achieving 95%+ line coverage with generated tests
Bug Finding
KLEE excels at finding:
- Assertion violations: Reaching
assert()failures - Null pointer dereferences: Invalid memory access
- Buffer overflows: Out-of-bounds array access
- Division by zero: Undefined behavior detection
When KLEE finds a bug, it generates a test case that reproduces the error, making debugging straightforward.
Advanced Features
Controlling Path Explosion
Real programs have exponentially many paths. KLEE provides limits:
klee -max-time=10min -max-memory=4096 -max-forks=1000 program.bc
-max-time: Halt after specified time-max-memory: Limit memory usage (in MB)-max-forks: Stop forking after N symbolic branches
Coverage-Guided Execution
klee --only-output-states-covering-new program.bc
This generates test cases only for states covering new code, dramatically reducing output while maintaining high coverage.
Solver Chain
KLEE uses SMT solvers (like STP, Z3) to solve path constraints. Configure via:
klee -solver-backend=z3 program.bc
Using KLEE with Existing Test Infrastructure
The klee-stats tool extracts statistics from KLEE runs:
klee-stats klee-out-0/
Output includes:
- Total instructions executed
- Number of completed paths
- Number of generated test cases
- Time spent in solver
For live monitoring, klee-stats can serve as a Grafana data source.
Tools in the KLEE Ecosystem
| Tool | Purpose |
|---|---|
ktest-tool |
Read and display .ktest files |
klee-stats |
Extract statistics from runs |
ktest-replay |
Replay test cases on native code |
klee-exec-tree |
Visualize execution tree (requires --write-exec-tree) |
ktest-gen |
Generate .ktest from concrete inputs |
ktest-randgen |
Generate random .ktest files |
Limitations and Considerations
-
Path Explosion: Symbolic execution suffers from exponential path growth. Large programs may not be fully explorable.
-
Constraint Solving Overhead: SMT solvers are powerful but can be slow for complex constraints.
-
Environment Modeling: KLEE’s POSIX emulation may not perfectly match real system behavior.
-
Floating-Point: Symbolic execution of floating-point code is challenging and partially supported.
-
External Calls: Handling of external library calls can be configured (
--external-calls=concrete|all|none).
Conclusion
KLEE represents a powerful approach to automated testing and bug finding. By systematically exploring program paths with symbolic inputs, it can achieve high coverage and discover subtle bugs that traditional testing might miss.
For software verification and validation, KLEE offers:
- Automated test generation with guaranteed coverage properties
- Concrete test cases that reproduce bugs
- Mathematical rigor through SMT-based constraint solving
As symbolic execution technology continues to mature, tools like KLEE are becoming increasingly practical for real-world software verification tasks.
References
-
Cadar, C., Dunbar, D., & Engler, D. (2008). KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. OSDI 2008.
-
KLEE Official Documentation: https://klee.github.io/docs/
-
KLEE GitHub Repository: https://github.com/klee/klee
-
LLVM Compiler Infrastructure: https://llvm.org/
-
Tutorials and Examples: https://klee.github.io/tutorials/