Your First ZK Program
This page walks you through one ZK program and grows it, piece by piece. By the end you'll have a proof that says, "I know a 4 MB blob whose XOR checksum is T" — around 80 lines of Rust, under two seconds on a laptop, verification in tens of milliseconds.
We start as minimal as it gets: three columns, one transition constraint, no chiplets. The whole thing works because of one quirk of the field Hekate runs on — in GF(2^k), addition is XOR. That's the hook. The two chapters that follow add one feature each, every time because the previous chapter hits a wall it can't climb.
The idea
Imagine a service hands you a 4 MB blob. You want to prove to someone else that you've seen every byte of it, and
that when you XOR the whole thing together you get a public value T — all without ever showing them the blob
itself.
Hekate's field is GF(2^k), where addition literally is XOR at the hardware level. The whole proof shrinks to a
one-line recurrence: acc_{i+1} = acc_i + data_i. No hash gadgets, no bit decomposition, no workarounds. That's
the quirk we're leaning on for the rest of this page.
The trace
An execution trace is just a table: rows are moments in time, columns are state. For this program we only need three columns and 2^20 rows. At 4 bytes of blob per row, that's the 4 MB we set out to process.
| Column | Type | What it holds |
|---|---|---|
DATA | B32 | One 4-byte chunk of the blob at this row |
ACC | B32 | Running XOR of everything we've seen so far |
Q | Bit | 1 on active rows, 0 on the final row to end the chain |
A quick word on column types. Hekate lets you store each column in whatever width actually fits — Bit for
booleans, B32 for u32s, B128 for full field elements. Since our data is just a stream of 4-byte words, B32
is the right choice: 4 bytes per cell instead of 16. When the prover needs to do real field arithmetic it lifts
those B32 values into the full Block128 field for you, so you keep the compact storage and still get
full-field security.
What "zero-knowledge" means in this proof
The trace is private. The verifier never sees DATA, never sees the running ACC, never sees the blob. What
the verifier learns is exactly two things: the public tag T (which they already knew, they're checking against
it) and the single bit "yes, the prover knew a 4 MB blob whose XOR is T". Nothing else. Two distinct blobs
with the same XOR produce indistinguishable proofs — the protocol erases everything except the claim itself.
Mechanically this works because Hekate's prover injects fresh AES-NI noise into every committed column before
hashing it into the Merkle tree, and an extra blinding column past the active rows absorbs whatever Sumcheck
would otherwise leak about specific cells. You don't write any of that. The sumcheck_blinding_factor field on
Config controls how much noise — 2 is the default for ZK builds, 0 turns blinding off and is only safe for
fully public traces.
The split between what's public and what's private gets a name in the API:
ProgramInstance<F>— the public side. Carriesnum_rows,public_inputs(the tag, in our case), and whatever else the verifier needs to recreate the statement they're verifying. Same instance goes to prover and verifier.ProgramWitness<F, T>— the private side. Wraps the trace plus any chiplet traces. Never crosses the network. The prover holds it, generates the proof, and the witness stops there.
Same Program<F> definition on both sides. Same instance on both sides. Witness only on the prover side. The
proof is the bridge.
The constraints
The AIR (Algebraic Intermediate Representation) is where you declare the rules the trace must obey.
use Config;
use errors;
use ;
use ;
use ;
use ConstraintSystem;
use define_columns;
use ;
use ;
use ;
type F = Block128;
define_columns!
What's the AIR actually saying? The heart of it is one transition: q * (next_acc + acc + data) = 0. Read it as
"between every pair of consecutive rows, next_acc must equal acc + data" — and remember, in GF(2^k)
subtracting and adding are the same thing, so next_acc = acc + data and next_acc + acc + data = 0 are two
ways of writing the same equation. The selector Q acts as an on/off switch so the constraint goes silent on
the final row and doesn't run off the end of the trace.
Then two boundary constraints pin the endpoints: the accumulator starts at zero, and at the last row it must
equal the public tag. One subtle thing to notice: the third argument of BoundaryConstraint::with_public_input
isn't a literal value, it's an index into the public inputs. So we pass the zero and the tag through
ProgramInstance::public_inputs at indices 0 and 1.
Why do we need the starting boundary at all? Because without it the program is actually unsound. An attacker
could start acc[0] at any value they like and still land exactly on tag at the last row by choosing the right
blob — the terminal boundary alone isn't enough. Pinning both endpoints is what forces the running XOR of the
data to really equal the tag.
Filling the trace
Nothing clever here. One row per 4-byte word of the blob, and a final row that just holds the accumulated tag
with Q = 0. Setting Q to zero on that last row makes the transition vacuously true — 0 * anything = 0 — so
we don't have to invent a phantom data byte for it. The terminal boundary still pins the value at that row, so
the math works out.
Proving and verifying
We have an AIR, a witness, and two public inputs. All that's left is to hand everything to the SDK. You don't have to know anything about Sumcheck or Brakedown to drive it — the SDK exposes three methods and each does one thing:
preflight— sanity-check the witness against your constraints before you generate a proof. Cheap, instant, and it'll save you from lots of painful debugging.prove— ship the program + instance + witness to the prover binary and get anInnerProofback.verify— check that proof locally. No binary needed, no network.
To run it, spin up a fresh cargo project, add the Hekate crates to Cargo.toml, drop the program into
src/main.rs, and:
Measured on Apple Silicon (release, 2^16 rows, single thread of work + Rayon for SpMV):
Trace Generation done in 315.92µs
Proving done in 29.10ms
Verifying done in 2.69ms
Total Proof size: 314.32 KB (321866 bytes)
Trace Commitment: 38 bytes
Main AIR ZeroCheck: 850 bytes
Eval Sumcheck: 578 bytes
Tensor Vector (q): 142667 bytes
Point Evaluations: 445 bytes
LDT Merkle Paths: 71841 bytes
LDT Opened Columns: 105441 bytes
The eval-batch components (tensor + LDT) dominate; everything else is in the noise. Bumping num_vars to 20 (the
4 MB blob this chapter set out to process) scales prove time roughly 16×.
Why preflight matters
The first time you write an AIR you will get a constraint wrong. That's normal. The rough part is what happens when you skip preflight: the prover will happily finish, and the verifier will reject, and the only thing you get back is "sumcheck failed at round N" — which tells you absolutely nothing about which row or column of your trace actually broke.
Preflight spares you all of that. It evaluates the constraint system row by row on your concrete trace before any
proving work starts. The report that comes back is structured (PreflightReport<F> with
constraint_violations, boundary_violations, lagrange_pin_violations, bus_diagnostics), and the Debug
output it formats reads like this:
A clean trace:
PREFLIGHT: 0 constraint, 0 boundary, 0 lagrange pin, 0 bus
A failed transition constraint (the AIR rule that pairs of rows must satisfy next_acc + acc + data = 0):
PREFLIGHT: 2 constraint, 0 boundary, 0 lagrange pin, 0 bus
[Main] Constraint 0 "fib_a" failed at row 1
[Main] Constraint 1 "fib_b" failed at row 2
A boundary mismatch (the value at a pinned row doesn't match the public input):
PREFLIGHT: 0 constraint, 2 boundary, 0 lagrange pin, 0 bus
Boundary #0: col=0 row=0 actual=Flat(Block128(0)) expected=Flat(Block128(2206...))
Boundary #1: col=1 row=0 actual=Flat(Block128(1)) expected=Flat(Block128(2654...))
A bus mismatch (the multiset on one side of the LogUp bus doesn't match the other):
PREFLIGHT: 0 constraint, 0 boundary, 0 lagrange pin, 1 bus
Bus "lookup_bus" (2 endpoints):
Main: 8 rows, 4 active, product=Flat(Block128(3339...))
Chiplet 0: 8 rows, 2 active, product=Flat(Block128(1795...))
Each line names the table (Main or Chiplet N), the constraint or boundary or bus that failed, and the row.
The constraint label ("fib_a") is whatever string you passed to cs.constrain_named(...); if you used plain
cs.constrain(...) you get the index alone. Debugging goes from hours to seconds. Get in the habit of calling it
first.
What you just proved, and what you didn't
Congratulations — you have a working zero-knowledge proof. The prover knows a 4 MB blob whose XOR fingerprint is
T, the verifier confirms it, and the blob itself stays private. Good enough for parity checks, secret-sharing
combiners, and any protocol whose primitive is XOR.
There's a hole, though, and it's the kind that only shows up once an adversary is in the picture. XOR is not
collision-resistant. Given any blob, an attacker can swap two bytes, flip eight bits in the right places, or
substitute an entirely different 4 MB payload — and as long as the new bytes XOR to the same T, your proof
accepts it without complaint. The fingerprint says "some blob with this XOR." It does not say "the blob you
were supposed to process." For a checksum that's fine. For anything where the content matters — a firmware
image, a deterministic transcript, a program a CPU is about to execute — the gap is a soundness break dressed
up as a feature.
That's what Chapter 2 closes.
Constraining what goes in
The fix is to publish the blob's positional content up front, not just its fingerprint. We'll commit to a
specific 4 MB byte stream — a "program image" — and add a second guarantee on top of the existing checksum: not
only does the blob XOR to T, but byte i of the blob equals byte i of the published image, for every i.
Two guarantees, one proof. The XOR check still runs unchanged; the new check rules out every blob except the
one image we sanctioned.
The Hekate primitive we'll use is positional equality: row i on the CPU side must match row i on the ROM side, on every active row. It's the exact contract you want whenever the CPU reads a pinned sequence in order — continuation-style rollups, streaming data pipelines, this checksum example.
A second table, linked by a bus
Hekate keeps side tables in chiplets. A chiplet is just an independent AIR — its own trace, its own commitment, its own sumcheck — running next to your main trace. The two are connected by a LogUp bus.
A bus spec carries a kind: Permutation or Lookup. The two do different jobs:
- Permutation — cross-bus cancellation is multiset equality. Both endpoints emit the same multiset of rows, any order. Used where one side reshuffles the rows of the other (Chapter 3's RAM sorts events by address; the order doesn't match the CPU side but the contents do).
- Lookup — cross-bus cancellation is pointwise equality on the padded hypercube. Row i on the CPU side must equal row i on the ROM side, on every active row. Used where the CPU is executing a pinned sequence: program images, deterministic tables, anything where position matters and not just contents.
For a committed program image we want Lookup. Both CpuFetchUnit::linking_spec() and
RomChiplet::linking_spec() return Lookup-kind specs; you don't pick the kind yourself, the library specs pick
it for you.
The bus contract under Lookup is: every active row on the CPU side emits (pc, opcode, args) into a ledger,
every active row on the ROM side emits the same tuple, and the verifier checks that the two sides agree row
for row. The moment CPU row 3 disagrees with ROM row 3 — a flipped bit in the opcode, a swapped argument, a PC
off by one — the bus leaves a stray term and the proof fails.
Reshaping ChecksumCols to match the bus
use ;
use ChipletDef;
use PermutationCheckSpec;
Before we can wire up the bus, there's a small piece of reality to deal with. The LogUp bus works at byte
granularity: each Source::Column(i) in a spec points at one column that holds one byte of the key. The shipped
RomChiplet has a fixed 8-byte tuple shape — four PC bytes, one opcode, three args — and the matching CPU side
is defined by CpuFetchUnit::linking_spec(), which hard-codes the column indices it expects. If we want to reuse
that off-the-shelf spec (and we do — hand-rolling your own is Chapter-4 territory), the first nine columns of our
main trace need to look exactly like CpuFetchColumns.
So we reshape ChecksumCols. The four data bytes move into the OPCODE and ARG* slots. We keep the four PC
slots as zero-filled columns because we don't actually need a PC, but we need the shape. Q is renamed to
SELECTOR to line up with the fetch unit's selector. The accumulator columns come after — they only matter for
the checksum logic, not the bus.
define_columns!
OPCODE and ARG0..ARG2 carry the four bytes of the data word now; PC_B0..B3 stay zero; SELECTOR does
the same job Q did in Chapter 1. The accumulator splits into four parallel lanes, one per data byte, each with
its own running XOR:
let sel = cs.col;
for in
The boundary constraints grow to match — every accumulator lane needs its own start-at-zero and its own
end-at-the-tag-byte. That means the public inputs vector grows from two entries to five: [ZERO, tb0, tb1, tb2, tb3], one zero everyone shares, plus the four bytes of the expected tag.
Wiring the bus
Now that the main trace has the right shape, wiring the bus is almost a formality. The CPU side is off-the-shelf
(CpuFetchUnit::linking_spec()), and we wrap the RomChiplet with ChipletDef::from_air(...) to expose it as a
proper independent table:
Notice what just happened: the main AIR didn't gain a single new transition constraint. The XOR checksum still
runs, just across four byte lanes instead of one. All the new validity logic lives in the chiplet and the bus
match. ChipletDef::from_air snapshots the chiplet's layout, constraints, and virtual expander in one call, and
the ROM table runs as a fully independent trace — its own commitment, its own sumcheck, sitting alongside the
main trace. The prover draws a per-bus challenge r_bus after every chiplet root has been absorbed into the
transcript; that challenge is what weights the Lookup bus-sum term so positional equality is actually
enforced at cryptographic soundness, not just at multiset level.
Building the program image
The chiplet side uses Instruction records with the same 8-byte layout the bus expects:
For a pure data channel we leave pc at zero and pack each 4-byte blob word into opcode and args. The
program image is the blob itself, instruction-for-instruction. Since the bus enforces row-for-row equality with
the main trace, the ROM needs the same number of rows as the main trace — 2^20:
let program_image: = blob
.chunks_exact
.map
.collect;
let rom_trace = generate_rom_trace?;
generate_rom_trace is a small helper in hekate-chiplets. It walks the Instruction list, fills the nine
RomChiplet columns (four PC bytes, one opcode, three args, one selector) via TraceBuilder, and leaves the
selector at zero on any unused rows so they stay quiet on the bus.
The height constraint is real, not cosmetic. A
Lookupbus pads both endpoints toN_max = max(CPU_rows, ROM_rows)and demands row-i-equals-row-i on every row of that padded hypercube, padding included. Padding rows all carryselector = 0on both sides — they match trivially. But that also means active CPU rows past the ROM's real height would be forced toselector = 0too (to match ROM's padding), which disables them. Practical consequence: the number of active CPU rows is capped at the number of active ROM rows, and the active positions on both sides must align 1:1. For a linear execution pattern like this checksum the fit is natural (one CPU row per program record, trace ≈ program). Programs with control flow that revisit the same opcode many times need a different primitive and don't belong in this chapter.
The witness gains a chiplet trace
let = generate_trace?;
let rom_trace = generate_rom_trace?;
let witness = new.with_chiplets;
with_chiplets is a builder on ProgramWitness. The order of chiplet traces has to match chiplet_defs(), but
that's the only rule — each chiplet has its own row count. For this program both are 2^20 because the
bus is positional, but nothing forces them to match a priori. Different heights are fine for Permutation buses
(Chapter 3) and for Lookup buses where the ROM is a short pinned table read from many execution rows —
the bus handles N_max padding automatically.
Everything else — preflight, prove, verify — is the same call as Chapter 1. Preflight now also reports bus diagnostics: if the blob diverges from the program image at any row, preflight points at the row and at both sides' bytes before any proving work starts.
Measured at 2^16 rows on the same hardware:
Trace Generation (main) done in 1.00ms
Trace Generation (ROM) done in 1.79ms
Proving done in 86.77ms
Verifying done in 6.65ms
Total Proof size: 789.02 KB (807960 bytes)
Trace Commitment: 38 bytes
Main AIR ZeroCheck: 850 bytes
Eval Sumcheck: 578 bytes
Tensor Vector (q): 142667 bytes
Point Evaluations: 657 bytes
LDT Merkle Paths: 71841 bytes
LDT Opened Columns: 207841 bytes
Chiplets (1):
Commitments: 39 bytes
ZeroChecks: 851 bytes
Eval Arguments: 382489 bytes
LogUp Bus Aux (2 specs):
Main: 54 bytes
Chiplets: 55 bytes
Cost over Chapter 1: ~3× prove time, ~2.5× proof size. Both deltas come from the new chiplet table — its eval argument (~382 KB) and the doubled LDT opened columns on the main side (now serving a bus consumer on every active row).
What you just proved, and what you didn't
Same ChecksumProgram as Chapter 1, now with a second guarantee alongside the original one: the blob is exactly
the program image you handed to generate_rom_trace (byte-for-byte, row-for-row), and its XOR fingerprint still
equals the tag. The fingerprint catches integrity bugs; the Lookup bus catches "wrong program" bugs. For a
linear execution pattern like this checksum — CPU row i always processes program word i — positional equality is
a tight fit.
There's still a structural gap, and it's the one that separates streaming programs from real programs. Try
to extend ChecksumProgram with anything that needs to hold state across distant rows — a histogram bin you
update on row 5 and again on row 1700, a stack you push to in one phase and pop from in another, an
intermediate result you compute early and consume late — and you discover the main trace can't help you. AIR
constraints only see the current row and the next. The Lookup bus only enforces row-i-equals-row-i. Neither
gives you an address you can write to at one moment in time and read back from at another. Pure forward-walking
arithmetic has no notion of memory.
That's the gap Chapter 3 closes — with a different chiplet that uses a different bus kind, doing a different job.
Remembering what you did
The new chiplet is RAM, and the contract it provides is exactly what the name suggests: the program writes
(addr, val) pairs to a scratch buffer at one row and reads them back at a later row, and the chiplet proves
that every read returned the value most recently written to that address. We'll use it to extend the running
example: treat the blob as an access log. Every word becomes a write of (addr_i, word_i). After the writes
finish, the program reads back a handful of addresses chosen by the verifier and claims the values match a
public expectation. The proof now carries a third guarantee:
Every write happened, every read-back returned the value that was actually written, and nothing tampered with the buffer in between.
You genuinely cannot build this with a Lookup bus. That bus proves "row i on side A equals row i on side B"
— it has no way to express reads and writes that need to agree across distant rows. Random-access memory needs
a different contract: one where the two sides hold the same multiset of events but the ordering differs (program
order on one side, sorted-by-address on the other), and the chiplet proves read-after-write consistency locally
on the sorted side. That's a Permutation bus on top of a sort — the second bus kind we skipped in Chapter 2.
How RAM verification works
The RAM chiplet in Hekate is what the literature calls an offline memory checker. The idea is clean once you
see it: the main trace emits a stream of memory events in execution order (one event per memory-op row), and the
RAM chiplet takes the exact same events, but sorts them by (address, clock) first. On that sorted side the
chiplet proves two simple local properties:
- The sort is correct — adjacent rows really do satisfy
(addr_i, clk_i) ≤ (addr_{i+1}, clk_{i+1}). - The values are consistent — within any run of rows at the same address, every read returns the value of the most recent write (or a zero default on first touch).
The two sides talk through the "ram_link" bus, which carries a 13-field tuple for every event:
(addr_b0..b3, clk_b0..b3, val_b0..b3, is_write). The CPU side adds events in program order, the RAM side adds
the same events in sorted order, and the LogUp bus proves the two multisets match. Local sorting + local
consistency + global multiset equality = a fully verified memory oracle. Nothing materialises a merged trace,
nothing forces the two sides to line up row for row.
Memory events
use ;
One MemoryEvent per memory op. The four CPU-side clock bytes clk_b0..b3 are not real columns. The bus
generates them virtually from the CPU's row index via Source::RowIndexByte(0) through
Source::RowIndexByte(3). On the RAM side the sorted trace does store CLK_B0..B3 as real columns, because the
sort has reshuffled the events and the chiplet can no longer rely on row-index-as-time.
Split the blob into 4-byte chunks, pick the public read-back schedule, and size the RAM table to cover every event (power of two):
let chunks: = blob
.chunks_exact
.map
.collect;
let public_reads: = /* (addr, expected_value) pairs known to the verifier */;
let ram_rows: usize = .next_power_of_two;
Build the event list from those inputs, writes first and then read-backs:
let mut events = Vecwith_capacity;
for in chunks.iter.enumerate
for in &public_reads
let ram_trace = generate_ram_trace?;
generate_ram_trace does a lot under the hood: it sorts the events by (addr, clk), checks that the sequence is
locally consistent (no read disagrees with its last write), fills the 20 physical RAM columns, and flat-out
refuses to build the trace if it finds a mismatch. It's your first line of defence — preflight would catch the
same problems, but generate_ram_trace catches them before the witness even leaves your process.
What grows on the main side
Everything from Chapter 2 stays exactly where it was. We just append ten new columns at the end of ChecksumCols
— four address bytes, four value bytes, IS_WRITE, and a memory selector MEM_SEL — to carry the CPU side of
the memory bus:
define_columns!
There's one small gotcha. CpuMemoryUnit::linking_spec() was written assuming CpuMemColumns starts at index 0
— but in our layout MEM_ADDR_B0 sits at index 13, after everything we already had. So before we hand the spec
to the bus, we shift its column indices by the right offset. That's what shift_column_indices does. The Air
impl gains one entry for the RAM bus, and Program<F> gains one entry for the RAM chiplet:
The public-input count stays at 5 from Chapter 2 (ZERO plus four tag bytes). Chapter 3 adds no boundary
constraints; the RAM chiplet does all the value-binding work through the bus.
Two chiplets, two buses, two independent traces, and the main AIR didn't gain a single new transition constraint for it. All the new logic lives in the RAM chiplet itself and in the bus match. This is the whole payoff of the multi-table architecture: features compose by adding tables, not by bloating your main AIR.
Keeping padding inert
We now have two selectors — SELECTOR for the checksum side and MEM_SEL for the memory side. Padding rows set
both to zero, and that's enough: the XOR transition goes silent whenever SELECTOR = 0, and the memory bus
contribution goes silent whenever MEM_SEL = 0 (both specs are gated by their selectors). So padding rows simply
vanish from the protocol — no extra AIR constraints required.
If you were building a program that could halt mid-trace (an early return, an interrupt), you'd also need the Ghost Protocol: an extra constraint
s_halt * MEM_SEL = 0that forces halted rows to stay off the memory bus even if an attacker tried to fill them with fake events. We don't halt here, so a plain boolean selector is all we need. The Cryptographic Chiplets page covers the full Ghost Protocol story.
The witness gains a third trace
generate_trace grows one argument: the public read-back schedule &[(addr, value)]. The body still does
everything it did in Chapter 2 (fill OPCODE/ARG*, ACC_B*, SELECTOR, run the XOR), and then layers on two
new phases for the memory columns: writes for the data words, reads for the public check-backs. The return type
is the same as before, so the tag boundary constraint keeps working without any downstream change:
Three traces in total, one main plus two chiplets:
let = generate_trace?;
let rom_trace = generate_rom_trace?;
let ram_trace = generate_ram_trace?;
let witness = new
.with_chiplets;
Three traces in, one witness out. The order has to match chiplet_defs() but row counts are fully independent —
the main trace at 2^20, ROM at 256, RAM at 2^20 (it must hold every event). Preflight now reports bus
diagnostics for every bus; if the read-back at address A expected value V but you actually wrote W,
preflight tells you the row, the address, and both values.
Measured at 2^16 rows on the same hardware:
Trace Generation (main) done in 4.32ms
Trace Generation (ROM) done in 1.75ms
Trace Generation (RAM) done in 105.89ms
Proving done in 388.49ms
Verifying done in 11.57ms
Total Proof size: 1371.88 KB (1404810 bytes)
Trace Commitment: 38 bytes
Main AIR ZeroCheck: 850 bytes
Eval Sumcheck: 578 bytes
Tensor Vector (q): 142667 bytes
Point Evaluations: 933 bytes
LDT Merkle Paths: 71841 bytes
LDT Opened Columns: 294881 bytes
Chiplets (2):
Commitments: 77 bytes
ZeroChecks: 2517 bytes
Eval Arguments: 890213 bytes
LogUp Bus Aux (4 specs):
Main: 106 bytes
Chiplets: 109 bytes
Cost over Chapter 2: ~4.5× prove time, ~1.7× proof size. RAM trace generation alone is 106 ms — the chiplet
sorts every event by (addr, clk) and verifies local read-after-write consistency before the prover ever
touches the trace. The RAM chiplet's eval argument adds ~510 KB; the LDT-opened-columns line on the main
side grows again because the new cpu_mem bus claims another column at every active row.
What you just proved
Same ChecksumProgram you started with — doing three things at once now:
- The blob's XOR fingerprint equals the public tag. (Chapter 1.)
- The blob is exactly the pinned program image, row-for-row. (Chapter 2.)
- The program wrote each word into a scratch buffer, and every public read-back returns the value that was actually written. (Chapter 3.)
That's the full zkVM trick in miniature: a streaming linear trace, a positional bus to bind the program image, a memory oracle for random access. Every larger Hekate program — AES, Keccak, ML-KEM, ML-DSA, rollup state transitions — is built out of exactly these primitives. You already know all of them.
Where to go next
- Chiplets, the full library: AES, Keccak, ROM, RAM, NTT, ML-KEM, ML-DSA, and how to write your own.
- Writing AIR Constraints, the constraint DSL in depth.
- Preflight, row-level diagnostics and what the report fields mean.
- System Architecture, Sumcheck + Brakedown + LogUp end to end.