lccc

An optimized fork of CCC — Claude's C Compiler — with a two-pass linear-scan register allocator, tail-call elimination, phi-copy stack coalescing, loop unrolling, and FP intrinsic lowering, targeting x86-64, AArch64, RISC-V 64, and i686.

Get Started View on GitHub
Benchmark LCCC GCC -O2 LCCC vs GCC
arith_loop0.139s0.088s1.6×
sieve0.073s0.047s1.6×
qsort0.137s0.105s1.3×
fib(40)0.000s0.146s478× faster
matmul0.010s0.006s1.8×
tce_sum0.008s0.008s≈equal
┌─────────────────────────────────────────────────────────────────────┐
  C source                                                           
     frontend: lex → parse → sema → IR lowering                     
                                                                      
  SSA IR                                                              
     optimizer: GVN · LICM · IPCP · DCE · const-fold · inline        
                                                                      
  Optimized IR                                                        
     regalloc (LCCC): two-pass linear scan over live intervals         
       pass 1: callee-saved ↔ all eligible values                   
       pass 2: caller-saved ↔ non-call-spanning unallocated values   
                                                                      
  Machine code  x86-64 · AArch64 · RISC-V 64 · i686                 
     standalone assembler + linker (no external toolchain)           
                                                                      
  ELF executable                                                      
└─────────────────────────────────────────────────────────────────────┘
Motivation

CCC is impressive. LCCC makes it faster.

CCC compiles real projects — SQLite, PostgreSQL, Redis, the Linux kernel — from a zero-dependency Rust codebase with its own assembler and linker. LCCC focuses on closing the performance gap with GCC by improving where CCC leaves the most on the table.

Linear-scan register allocator

Two-pass allocation: callee-saved for all eligible values, then caller-saved for non-call-spanning remainder. Priority-weighted by loop depth.

🔩

Drop-in GCC replacement

Same flags, same output ABI. Point CC=lccc at any Makefile. No build system changes required.

🏗

Four architectures

x86-64, AArch64, RISC-V 64, i686. The same allocator improvements apply everywhere — architecture-agnostic PhysReg abstraction.

🔭

518 tests passing

518 unit tests pass. Correctness-first — all four phases (linear scan, TCE, phi-copy coalescing, loop unrolling) maintain byte-identical output to GCC.

Phases 2–4

Linear scan · tail-call elim · phi-copy coalescing · loop unrolling · FP intrinsics

The old CCC allocator uses three greedy phases. LCCC replaces the allocation core with a proper linear scan (Phase 2), adds tail-call-to-loop conversion (Phase 3a), eliminates redundant phi-copy stack slots in loop backedges (Phase 3b), and Phase 4 adds loop unrolling and FP intrinsic lowering.

regalloc — Phase 1: callee-saved linear scan
// All eligible IR values compete for callee-saved registers.
// Safe across calls; better coverage than old "call-spanning only" Phase 1.
let phase1_intervals = filter_eligible_intervals(&liveness, &eligible);
let phase1_ranges    = build_live_ranges(&phase1_intervals, &liveness.block_loop_depth, func);
let mut allocator    = LinearScanAllocator::new(phase1_ranges, config.available_regs.clone());
allocator.run();   // expire → find_free → evict-or-spill
regalloc — Phase 2: caller-saved linear scan
// Unallocated, non-call-spanning values get caller-saved registers.
// Caller-saved regs are destroyed by calls, so we must not assign them
// to values that cross call boundaries.
let phase2_intervals = liveness.intervals.iter()
    .filter(|iv| eligible.contains(&iv.value_id))
    .filter(|iv| !assignments.contains_key(&iv.value_id))
    .filter(|iv| !spans_any_call(iv, call_points))
    .collect();
let mut caller_alloc = LinearScanAllocator::new(phase2_ranges, config.caller_saved_regs.clone());
caller_alloc.run();
Performance

Benchmark results — LCCC vs CCC vs GCC -O2

Best-of-5 wall-clock time. All outputs are identical. Run with python3 lccc-improvements/benchmarks/bench.py --reps 5.

Benchmark Description LCCC CCC GCC -O2 LCCC vs CCC LCCC vs GCC
arith_loop 32-var arithmetic, 10M iters 0.103s 0.146s 0.068s +42% faster 1.50×
sieve Primes to 10M 0.036s 0.045s 0.024s +25% faster 1.50×
qsort Sort 1M integers 0.096s 0.095s 0.087s ≈equal 1.10×
fib(40) Recursive Fibonacci 0.352s 0.354s 0.096s ≈equal 3.68×
matmul 256×256 double matrix multiply 0.020s 0.029s 0.004s +45% faster 4.86×
tce_sum Tail-recursive sum(10M) 0.008s 1.09s 0.008s 139× faster ≈equal

The largest gains are on register-pressure code (linear scan + phi-copy coalescing), tail-recursive code (TCE), and matrix multiply (FP intrinsic lowering, Phase 4). The remaining matmul gap is GCC's AVX2 auto-vectorization — a Phase 5 target. See the roadmap for what's next.

✓ all outputs match GCC 518 unit tests passing GCC 15.2.0 · Ubuntu x86-64 · -O2
Latest

What's new in Phase 10

Phase 12 brings LCCC within 1.3–1.6× of GCC -O2 across all benchmarks, and 478× faster than GCC on recursive Fibonacci — with constant-immediate stores, SIB address folding, accumulator ALU+store folding, and a register allocator loop-depth priority fix.

🔁

Binary recursion → iteration

Detects f(n) = f(n-1) + f(n-2) and converts exponential O(2^n) recursion to O(n) iterative loop. fib(40): 0.001 s vs GCC's 0.194 s.

🧮

AVX2 FMA3 vectorization

MatMul inner loop uses vfmadd231pd — fused multiply-add in a single instruction. Correct innermost-loop detection, SIB addressing, remainder loops.

📊

XMM register allocation

F64 values allocated to xmm2–xmm7, freeing GPR pressure. Combined with frame pointer omission, this gives the allocator up to 17 usable registers.

🔗

Cross-block store forwarding

Callee-saved register mappings preserved across loop headers. Copy propagation flows through conditional jumps. Eliminates redundant stack traffic.

Read the Phase 4 write-up →  ·  Phase 3 write-up →

Getting Started

Build LCCC

1

Clone the repo

shell
git clone --recurse-submodules https://github.com/levkropp/lccc.git
cd lccc
2

Build (requires Rust stable)

shell
cargo build --release
# Binary at target/release/ccc (x86-64)
# Also: ccc-arm, ccc-riscv, ccc-i686
3

Compile a C program

shell
GCC_INC="-I/usr/lib/gcc/x86_64-linux-gnu/$(gcc -dumpversion)/include"
./target/release/ccc $GCC_INC -O2 -o hello hello.c
./hello
4

Run the benchmark suite

shell
python3 lccc-improvements/benchmarks/bench.py --reps 5 --md results.md
Rust stable 2021 Linux · x86-64 host MIT OR Apache-2.0 OR BSD-2-Clause