MISSION

Chip design is bottlenecked by simulation. We are rebuilding it on the GPU.

Verifying a modern chip takes longer than designing it. Simulators, sold by a two-company duopoly, set the ceiling on how fast any chip team can iterate. We are rebuilding the simulator from the ground up for parallel hardware — and using the throughput it unlocks to train AI to design chips.

The bottleneck

Designing a chip starts in code. Engineers describe the logic in Verilog or SystemVerilog. Before anyone fabricates anything, the design has to be verified by running it in software, across millions of test inputs, looking for bugs that would otherwise become silicon defects costing tens of millions of dollars and six months of turnaround.

Verification dominates the timeline.

60–70%

of chip design effort is verification

75%

of ASIC projects miss tape-out

14%

achieve first-silicon success

Two companies own the simulator market: Synopsys (VCS) and Cadence (Xcelium). Both ship architectures introduced in the early 1990s — single-threaded, CPU event-driven kernels — at fifty to a hundred and fifty thousand US dollars per seat per year. Their combined FY26 guidance is roughly 15.5 billion US dollars, 90% recurring, 45% operating margins. The seat model creates a structural lock: every minute of faster simulation cannibalises a paid seat. They will not rebuild themselves.

Verification engineers wait overnight, sometimes days, for regression runs to finish. Chip teams staff up specifically to buy back the throughput they cannot get from the simulator. This is the binding constraint on the industry.

Our bet

Circuits are inherently parallel. Existing simulators treat them as sequential CPU code.

We collapse a chip’s logic into one uniform primitive — an and-inverter graph, where every gate is either AND or NOT, trivially parallel. Then we run it on a GPU: thirty-two gates per CUDA instruction, thousands of stimulus streams per H100 in lockstep.

Verilator (1-thread)

290 KHz

single-thread baseline

Verilator (multi-threaded)

1.27 MHz

multi-threaded, same machine

arc (GPU, N=1024)

3.85 MHz

13× over 1-thread · 3× over multi-threaded

The result is arc, our simulator. On a Modal H100, a 1,024-stimulus regression against the full vexriscv RISC-V CPU aggregates to 3.85 MHz — thirteen times faster than single-thread Verilator and three times faster than multi-threaded Verilator on the same workload.

See the demo for how the pipeline works end-to-end.

Why now

Four things have changed.

GPU economics crossed parity

H100s rent on spot at two US dollars an hour. The same compute would have cost half a million dollars in 2018. The economic window for GPU-substrate EDA just opened.

AI-assisted code production

A simulator rebuild that took Verilator’s author twenty years to complete is now a possible for a small team. The parts that were prohibitive solo — parsers, compilers, kernels — became tractable in the last few months.

AI chip design as a real workload

LLMs already generate basic RTL. RL on silicon is moving from research into production at Google, and a long tail of academic labs. The training loop needs millions of cheap, fast, deterministic simulations per day — an environment that does not yet exist.

Incumbents structurally cannot respond

Synopsys spent 35 billion on Ansys; Cadence is wrapping AI agents around the old toolchain. Neither is rebuilding the simulator. Every minute of faster simulation cannibalises a paid seat — the classic innovator’s dilemma. A two-to-four-year window before consolidation closes it.

The window is now. It does not stay open forever.

The wedge

The first paying customers are AI-chip startups — MatX, Tenstorrent, Etched, Fractile, and the next fifty after them. Small teams, aggressive tape-out timelines, zero incumbent inertia. They cannot buy enough VCS licences at any price to support the parallel test scenarios their architectures demand.

INCUMBENT

50–150K

USD per seat, per year

Annual commit. License servers. No transparency.

STANDARD MACHINES

USD per simulation-hour

Metered by the second. No seats. No annual commit.

A regression that takes hours on the commercial simulators finishes in minutes on an H100 — at the same compute price, billed by the second. The information asymmetry that the seat model depends on collapses the moment one buyer compares the receipts.

From the simulator we expand outward: the testbench runtime, the waveform viewer, the synthesis flow — the rest of the design stack, on the same GPU substrate, on the same pricing model.

Where this goes

Chip design will be done by AI. The bottleneck is not generation — LLMs can already produce RTL. The bottleneck is the feedback loop: an AI designer needs a faithful, fast, deterministic environment to iterate against, with millions of designs evaluated per day and every reward signal grounded in actual silicon behaviour rather than approximation.

We are building that environment. The simulator is the first piece — the wedge that funds the rest, the substrate that everything else compiles to, and the missing component for any serious attempt at closed-loop AI chip design.