Article 07

We routed HC-1 in under 10 seconds with a GPU sidecar.

May 16, 2026 Physical design

Routing speed changes how you think. When placement, global routing, and detailed routing take hours, every experiment becomes expensive. You wait on the machine, lose the shape of the problem, and start making fewer design moves.

For HC-1, that delay was getting in the way. We needed routing close enough to the design loop that a change could be tested while the layout problem was still fresh. So we built a CUDA sidecar for OpenROAD-style placement, global routing, and detailed routing.

On the HC-1 proxy design, the GPU sidecar completed placement, global routing, and detailed routing in under 10 seconds.

What the sidecar does

The sidecar sits beside an OpenROAD-compatible flow. It exports DEF/LEF data into a compact ORAX interchange format, runs GPU kernels for placement and routing work, and writes artifacts that can be imported back into the flow.

The current package includes CUDA kernels for net-force and density-force placement updates, batched L-shape global-route scoring on a congestion grid, and guide-constrained detailed routing track scoring. It also includes Python adapters for DEF/LEF export, placement import, and route-segment import.

The captured HC-1 run

On a RunPod NVIDIA GeForce RTX 4070 Ti, the captured HC-1 proxy replay produced these stage timings:

Placement: 1.00s.
Global routing: 0.93s.
Detailed routing: 5.10s.

The profiled detailed-route run showed 3.80s of wall time, while the CUDA-timed GPU work was measured in milliseconds. The remaining time was mostly host-side I/O and assembly overhead, with the routing kernels taking a small fraction of the run.

Why this matters

The headline number matters because it changes the working rhythm. A route that previously belonged to a many-hour CPU loop can move into a seconds-scale GPU experiment loop. That is an orders-of-magnitude change in iteration speed for this stage of the HC-1 flow.

Faster physical-design loops matter because HC-1 is a system problem. The ASIC proxy, photonic layout, interface maps, readout assumptions, recovery path, and DRC checks all feed each other. Waiting hours at the routing stage slows down the whole architecture.

What is public now

We released the OpenROAD GPU sidecar under Apache 2.0: Acculux OpenROAD GPU Sidecar.

This is experimental tooling from Acculux Systems. It has not been merged into OpenROAD and carries no OpenROAD endorsement. The numbers come from a captured HC-1 proxy replay. Design size, GPU, export/import overhead, and quality-of-result requirements all matter.

The direction is clear. When verification and routing loops move from hours to seconds, the hardware development process gets sharper. You try more, inspect more, and keep the system in your head while the machine catches up.