LCCC — Lev's Claude's C Compiler

Benchmark	LCCC	GCC -O2	LCCC vs GCC
`arith_loop`	0.139s	0.088s	1.6×
`sieve`	0.073s	0.047s	1.6×
`qsort`	0.137s	0.105s	1.3×
`fib(40)`	0.000s	0.146s	478× faster
`matmul`	0.010s	0.006s	1.8×
`tce_sum`	0.008s	0.008s	≈equal

┌─────────────────────────────────────────────────────────────────────┐
│  C source                                                           │
│   │  frontend: lex → parse → sema → IR lowering                     │
│   ▼                                                                   │
│  SSA IR                                                              │
│   │  optimizer: GVN · LICM · IPCP · DCE · const-fold · inline        │
│   ▼                                                                   │
│  Optimized IR                                                        │
│   │  regalloc (LCCC): two-pass linear scan over live intervals         │
│   │    pass 1: callee-saved ↔ all eligible values                   │
│   │    pass 2: caller-saved ↔ non-call-spanning unallocated values   │
│   ▼                                                                   │
│  Machine code  x86-64 · AArch64 · RISC-V 64 · i686                 │
│   │  standalone assembler + linker (no external toolchain)           │
│   ▼                                                                   │
│  ELF executable                                                      │
└─────────────────────────────────────────────────────────────────────┘

Motivation

CCC is impressive. LCCC makes it faster.

CCC compiles real projects — SQLite, PostgreSQL, Redis, the Linux kernel — from a zero-dependency Rust codebase with its own assembler and linker. LCCC focuses on closing the performance gap with GCC by improving where CCC leaves the most on the table.

⚡

Linear-scan register allocator

Two-pass allocation: callee-saved for all eligible values, then caller-saved for non-call-spanning remainder. Priority-weighted by loop depth.

🔩

Drop-in GCC replacement

Same flags, same output ABI. Point CC=lccc at any Makefile. No build system changes required.

🏗

Four architectures

x86-64, AArch64, RISC-V 64, i686. The same allocator improvements apply everywhere — architecture-agnostic PhysReg abstraction.

🔭

518 tests passing

518 unit tests pass. Correctness-first — all four phases (linear scan, TCE, phi-copy coalescing, loop unrolling) maintain byte-identical output to GCC.

Phases 2–4

Linear scan · tail-call elim · phi-copy coalescing · loop unrolling · FP intrinsics

The old CCC allocator uses three greedy phases. LCCC replaces the allocation core with a proper linear scan (Phase 2), adds tail-call-to-loop conversion (Phase 3a), eliminates redundant phi-copy stack slots in loop backedges (Phase 3b), and Phase 4 adds loop unrolling and FP intrinsic lowering.

regalloc — Phase 1: callee-saved linear scan

// All eligible IR values compete for callee-saved registers.
// Safe across calls; better coverage than old "call-spanning only" Phase 1.
let phase1_intervals = filter_eligible_intervals(&liveness, &eligible);
let phase1_ranges    = build_live_ranges(&phase1_intervals, &liveness.block_loop_depth, func);
let mut allocator    = LinearScanAllocator::new(phase1_ranges, config.available_regs.clone());
allocator.run();   // expire → find_free → evict-or-spill

regalloc — Phase 2: caller-saved linear scan

// Unallocated, non-call-spanning values get caller-saved registers.
// Caller-saved regs are destroyed by calls, so we must not assign them
// to values that cross call boundaries.
let phase2_intervals = liveness.intervals.iter()
    .filter(|iv| eligible.contains(&iv.value_id))
    .filter(|iv| !assignments.contains_key(&iv.value_id))
    .filter(|iv| !spans_any_call(iv, call_points))
    .collect();
let mut caller_alloc = LinearScanAllocator::new(phase2_ranges, config.caller_saved_regs.clone());
caller_alloc.run();

Performance

Benchmark results — LCCC vs CCC vs GCC -O2

Best-of-5 wall-clock time. All outputs are identical. Run with python3 lccc-improvements/benchmarks/bench.py --reps 5.

Benchmark	Description	LCCC	CCC	GCC -O2	LCCC vs CCC	LCCC vs GCC
`arith_loop`	32-var arithmetic, 10M iters	0.103s	0.146s	0.068s	+42% faster	1.50×
`sieve`	Primes to 10M	0.036s	0.045s	0.024s	+25% faster	1.50×
`qsort`	Sort 1M integers	0.096s	0.095s	0.087s	≈equal	1.10×
`fib(40)`	Recursive Fibonacci	0.352s	0.354s	0.096s	≈equal	3.68×
`matmul`	256×256 double matrix multiply	0.020s	0.029s	0.004s	+45% faster	4.86×
`tce_sum`	Tail-recursive sum(10M)	0.008s	1.09s	0.008s	139× faster	≈equal

The largest gains are on register-pressure code (linear scan + phi-copy coalescing), tail-recursive code (TCE), and matrix multiply (FP intrinsic lowering, Phase 4). The remaining matmul gap is GCC's AVX2 auto-vectorization — a Phase 5 target. See the roadmap for what's next.

✓ all outputs match GCC 518 unit tests passing GCC 15.2.0 · Ubuntu x86-64 · -O2

Latest

What's new in Phase 10

Phase 12 brings LCCC within 1.3–1.6× of GCC -O2 across all benchmarks, and 478× faster than GCC on recursive Fibonacci — with constant-immediate stores, SIB address folding, accumulator ALU+store folding, and a register allocator loop-depth priority fix.

🔁

Binary recursion → iteration

Detects f(n) = f(n-1) + f(n-2) and converts exponential O(2^n) recursion to O(n) iterative loop. fib(40): 0.001 s vs GCC's 0.194 s.

🧮

AVX2 FMA3 vectorization

MatMul inner loop uses vfmadd231pd — fused multiply-add in a single instruction. Correct innermost-loop detection, SIB addressing, remainder loops.

📊

XMM register allocation

F64 values allocated to xmm2–xmm7, freeing GPR pressure. Combined with frame pointer omission, this gives the allocator up to 17 usable registers.

🔗

Cross-block store forwarding

Callee-saved register mappings preserved across loop headers. Copy propagation flows through conditional jumps. Eliminates redundant stack traffic.

Read the Phase 4 write-up → · Phase 3 write-up →

Getting Started

Build LCCC

Clone the repo

shell

git clone --recurse-submodules https://github.com/levkropp/lccc.git
cd lccc

Build (requires Rust stable)

shell

cargo build --release
# Binary at target/release/ccc (x86-64)
# Also: ccc-arm, ccc-riscv, ccc-i686

Compile a C program

shell

GCC_INC="-I/usr/lib/gcc/x86_64-linux-gnu/$(gcc -dumpversion)/include"
./target/release/ccc $GCC_INC -O2 -o hello hello.c
./hello

Run the benchmark suite

shell

python3 lccc-improvements/benchmarks/bench.py --reps 5 --md results.md

Rust stable 2021 Linux · x86-64 host MIT OR Apache-2.0 OR BSD-2-Clause