Benchmarks¶

This page tracks the canonical baseline numbers for OpenLithoHub's bundled ILT models, plus the differentiable forward models that drive them.

Headline baseline (synthetic-8)¶

Numbers are produced by

python scripts/generate_baselines.py --synthetic --limit 8 --output baselines/

against eight hand-rolled 64×64 layouts (square, h-line, line/space, T, L, cross, contacts, dense lines). The synthetic suite is dataset-free and runs in seconds, which is why it is the published reference. Real-dataset numbers can be regenerated locally with --data-root <path>.

Model	Samples	EPE mean (nm)	EPE max (nm)	PVB mean (nm)	MRC pass
`dummy-identity`	8	0.000	0.000	2.140	0%
`rule-based-opc`	8	0.530	1.414	2.487	0%
`levelset-ilt` (Gaussian PSF)	8	0.036	0.250	2.128	0%
`neural-ilt` (untrained U-Net)	8	15.074	24.637	2.497	100%

Things worth knowing about these numbers:

dummy-identity copies the design straight through. Its EPE is zero by construction on a synthetic suite where design == target_mask. It exists as a smoke test of the metric pipeline, not as a real model.
rule-based-opc applies analytic per-edge bias OPC. It is the cheapest non-trivial baseline and a fair starting point when comparing AI methods.
levelset-ilt runs 200 iterations of gradient-descent ILT under the default Gaussian PSF forward model (sigma_px=2.0). The MRC failure count reflects the small synthetic patterns being narrower than the default min_width_nm=40; this is expected on a 64-pixel-wide canvas and not an indictment of the optimizer. Run with a real LithoBench layout or relaxed MRC thresholds for production-grade numbers.
neural-ilt uses a randomly-initialized U-Net unless you load pretrained weights via the model hub. The EPE shown here is the honest "no-weights" baseline. It will improve substantially after training; we report it as-is rather than hide it.

Reproducing on real data¶

Once you have LithoBench cached locally:

python scripts/generate_baselines.py \
  --data-root /path/to/lithobench \
  --limit 16 \
  --pixel-nm 1.0 \
  --output baselines/lithobench/

The same results.json / results.md artifacts land under the chosen output directory. Submit them to the public leaderboard with openlithohub leaderboard submit --file <results.json>.

ORFS-routed ASAP7 — RISC-V mock-alu (issue #4 Phase 3)¶

OpenLithoHub also supports real ASAP7-routed RTL→GDSII outputs via the OrfsArtifactDataset adapter. The first end-to-end target is mock-alu from OpenROAD-flow-scripts: the smallest RISC-V-style ALU design that exercises a complete flow (yosys → OpenROAD → routed GDS) in ~25 minutes on a Linux runner.

The adapter rasterizes one design layer of the routed block, then cuts it into fixed-size tiles. The default 2 µm × 2 µm and 5 µm × 5 µm windows match the AI-OPC inference scales used in the literature.

ORFS mock-alu — 2 µm tile, design / rule-OPC mask / resist contour

To produce the GDS, trigger the build-asap7-mock-alu workflow (gh workflow run build-asap7-mock-alu.yml), download the artifact, and run:

.venv/bin/openlithohub eval run \
  --dataset orfs --node 7nm --accept-license \
  --data-root /path/to/6_final.gds \
  --tile-nm 2000 --pixel-nm 4.0 \
  --no-drc --no-mrc \
  --model dummy-identity

Window	Tiles total	PVB mean (nm)	PVB max (nm)
2 µm × 2 µm	729	15.073	29.600
5 µm × 5 µm	121	14.980	39.600

ORFS pinned at 74b5f96; metal1 layer 20/0; pixel_nm=4.0 (a 1 nm grid would be 16× more pixels and the PV-band convolution scales O(N²)). Linux-only locally — see scripts/build_riscv_alu.sh for the equivalent local commands.

Hotspot detection — ICCAD 2016 Problem C¶

The ICCAD'16 EUV hotspot benchmark is wired in via openlithohub.data.Iccad16Dataset (klayout-based OASIS rasterizer) and the point-matching metric compute_hotspot_detection. Together they support a separate baseline track from the mask-optimization numbers above — the data has no reference mask, so EPE / PVB / MRC do not apply.

Dataset¶

4 published test cases (testcase{1..4}.oas + test{1..4}.csv) mirrored at https://github.com/phdyang007/ICCAD16-N7M2EUV.
Layer (1000, 0) is the design polygons; layer (10000, 0) is the hotspot-detection clip-site grid (16×16 nm windows, ~120 per case), exposed via LithoSample.metadata['clip_sites'].
Hotspot annotations live in metadata['hotspots'] as (hotspot_id, category_id, x_nm, y_nm) rows. The contest's category-id-to-defect-kind mapping (EPE / Bridging / Necking) is not published, so the loader preserves the raw integer.
LithoSample.mask is intentionally None for this dataset.

Metric¶

compute_hotspot_detection(predicted_points, ground_truth_points, match_radius_nm) does greedy point-matching: each predicted point counts as a TP iff an unmatched GT point lies within match_radius_nm. Returns {num_tp, num_fp, num_fn, recall, precision, f1}. Edge cases follow sklearn convention — empty-vs-empty is a vacuous perfect score; empty predictions against present GT give recall=0, precision=1.0.

Baselines¶

Sanity baselines (not ML predictors) are produced by:

python scripts/run_hotspot_baseline.py \
  --data-root data/iccad16 \
  --output out/hotspot \
  --match-radius-nm 100.0

Numbers below are from testcase1 (18 GT hotspots) at match_radius_nm=100 — the strict 1 nm radius is shown in the script's default output and gives all-zero TP for these strawman predictors.

Model	GT	Predicted	TP	FP	FN	Recall	Precision	F1
`empty`	18	0	0	0	18	0.000	1.000	0.000
`grid-200nm`	18	80	2	78	16	0.111	0.025	0.041
`clip-centers`	18	120	1	119	17	0.056	0.008	0.014

Things worth knowing:

empty predicts nothing. It pins the recall floor (0.0) while scoring vacuous precision=1.0; useful as a "metric is alive" check.
grid-200nm rasters predictions on a 200 nm lattice over the design bbox. Saturates the FP rate to expose the recall ceiling attainable by a brute-force "guess everywhere" predictor.
clip-centers treats the auxiliary clip-site layer as a predictor. It performs near-zero — confirming our empirical finding that the clip-site grid is an inspection-window layer, not a hotspot mask. The 70+ nm separation between clip centers and CSV hotspots makes this baseline a useful regression check against anyone re-mistaking layer 10000 for ground truth.

A real ML predictor (CNN, ViT, etc.) plugs into the same script by adding a function to the PREDICTORS dict that consumes a LithoSample and returns an (N, 2) tensor of nm-coordinates.

GAN-OPC paired-mask dataset¶

openlithohub.data.GanOpcDataset exposes the ~4875 paired-PNG training set from Yang et al., GAN-OPC: Mask Optimization with Lithography-guided Generative Adversarial Nets (TCAD 2020). Source: https://github.com/phdyang007/GAN-OPC (multi-volume 7z archive, unpacks to ganopc-data/{artitgt,artimsk}/N.glp.png + N.glpOPC.png).

from openlithohub.data import GanOpcDataset

ds = GanOpcDataset("data/ganopc/extracted")  # parent of ganopc-data/
sample = ds[0]
sample.design  # (2048, 2048) torch.float32, {0., 1.}
sample.mask    # (2048, 2048) torch.float32, {0., 1.}

The pairs are (design_layout, OPC_mask) so this dataset is suitable for AI-OPC training and for evaluating mask-optimization models with the standard EPE / PVB / shot-count / MRC metric stack — though no canonical baseline numbers are published here yet.

Differentiable forward models¶

Two forward models ship in openlithohub._utils. Both are pure PyTorch and auto-differentiable, so they slot directly into ILT optimization loops, AI-OPC training, or any downstream gradient-based pipeline.

Gaussian PSF (default)¶

simulate_aerial_image(mask, sigma_px, dose=1.0) — a single Gaussian point spread function convolved with the mask. Fast, faithful enough for unit tests and small synthetic patterns, and used as the default in LevelSetILTModel.

Hopkins partial-coherent imaging (SOCS)¶

simulate_aerial_image_hopkins(mask, params) implements the Sum-of-Coherent- Systems decomposition of the Hopkins transmission cross coefficient. It captures partial coherence, off-axis illumination, and defocus, which the Gaussian model cannot.

Configurable via HopkinsParams:

Field	Default	Meaning
`wavelength_nm`	193.0	Exposure wavelength (193 nm = ArF, 13.5 nm = EUV)
`na`	1.35	Numerical aperture (image-side)
`sigma`	0.7	Partial-coherence factor; outer sigma for annular/dipole/quasar
`sigma_inner`	0.0	Inner sigma for off-axis illumination
`pixel_size_nm`	1.0	Physical mask pixel size
`num_kernels`	24	SOCS truncation order
`illumination`	`"circular"`	One of `circular`, `annular`, `dipole`, `quasar`
`dipole_angle_deg`	0.0	Pole-pair orientation for dipole/quasar
`defocus_nm`	0.0	Defocus offset (parabolic phase)

Switch LevelSetILTModel to Hopkins:

from openlithohub._utils import HopkinsParams
from openlithohub.models.levelset_ilt import LevelSetILTModel

model = LevelSetILTModel(
    iterations=200,
    forward_model="hopkins",
    hopkins_params=HopkinsParams(
        wavelength_nm=193.0,
        na=1.35,
        sigma=0.7,
        num_kernels=24,
        pixel_size_nm=2.0,
    ),
)
result = model.predict(design)  # standard PredictionResult

The kernels are computed once and cached per (params, grid_size, device), so iterative ILT loops pay the SVD cost a single time.