OpenLithoHub v0.1 — Open computational lithography for the AI era¶

Published 2026-05-19 · Tags: lithography, EUV, OPC, ILT, ML-for-EDA, open-source

TL;DR — OpenLithoHub is an open, Apache-2.0-licensed benchmarking and workflow framework for computational lithography. It unifies the three things that have kept ML-for-OPC research irreproducible: the datasets (LithoBench, LithoSim, GAN-OPC, ICCAD'16), the metrics (EPE, PV-Band, shot count, EUV stochastic, MRC), and the forward physics (a differentiable Hopkins/SOCS model). Bring your own model, plug it into one interface, and get the same numbers everyone else gets.

Repo: https://github.com/OpenLithoHub/OpenLithoHub · Playground: https://huggingface.co/spaces/OpenLithoHub/playground · Docs: https://docs.openlithohub.com · Colab BYOM: https://colab.research.google.com/github/OpenLithoHub/OpenLithoHub/blob/main/notebooks/colab_byom.ipynb

Why this exists¶

ML-for-OPC and ML-for-ILT have been hot for five years, but the field has a reproducibility problem that nobody quite owns:

Datasets are scattered. LithoBench is on one GitHub repo, GAN-OPC on another, ICCAD'16 hotspot is on a benchmark page no one updates, LithoSim is a separate NeurIPS dataset. Every paper rolls its own loader, and a third of them have silent off-by-one bugs in coordinate conventions.
Metrics drift. "EPE" in one paper means edge-placement error measured at fragment endpoints; in the next it's measured along arc length; in the third it's RMS over a contour band. PV-Band is even worse.
There's no shared forward model. Half the papers compare against a Gaussian-PSF "imaging" stand-in that is not a lithographic model. The other half black-box a commercial simulator and cannot release the reference numbers.
MRC is treated as cosmetic. Plenty of beautiful AI-OPC results produce masks that no foundry would actually let through DRC. Without a hard MRC gate, the benchmark is fiction.

OpenLithoHub takes a strong position on each of these:

One adapter interface, every dataset behind it. Today: LithoBench, LithoSim, GAN-OPC, ICCAD'16. Tomorrow: synthetic generators against FreePDK45/ASAP7.
Metrics with one canonical implementation per name. If you call it EPE, it is computed by compute_epe in openlithohub.benchmark. Report another version, fine — but don't call it EPE.
A real, differentiable Hopkins/SOCS forward model. SVD-truncated, per-(params, grid) cache, supports circular / annular / dipole / quasar illumination plus defocus. Auto-differentiable end-to-end — drop it into your AI-OPC training loop. (See openlithohub._utils.hopkins.)
MRC is a hard gate. A mask that violates MRC fails the benchmark, no matter how good its EPE looks. We added curvilinear MRC checks (curvature radius + min area) for ILT outputs that aren't manhattan.

What's in v0.1¶

Datasets — LithoBench, LithoSim, GAN-OPC paired masks (~4875), ICCAD'16 Problem C hotspot, plus a hermetic dummy-layout generator that runs in CI without the workflow extras.
Models — dummy-identity, rule-based-opc (directional hammerheads, inner-corner serifs, iso/dense bias, MRC self-check), levelset-ilt, openilt (SimpleILT L2 + PVBand), neural-ilt. All conform to the same LithographyModel interface; bring-your-own in <50 lines.
Forward models — Hopkins/SOCS (default) and Gaussian PSF (legacy).
Metrics — EPE, PV-Band, shot count, EUV stochastic robustness, hotspot detection (recall/precision/F1).
MRC/DRC — manhattan and curvilinear; hard-fails the run.
Workflow — tiling/stitching with deduplication, contour extraction, B-spline fitting, OASIS round-trip, EDA bridge templates for Calibre nmDRC and Synopsys IC Validator.
Visualization — paper_style context manager with IEEE_STYLE and SPIE_STYLE presets, vector PDF, Type-42 fonts, colorblind-safe.
Playground — HuggingFace Space with 3 preset designs, BYOM upload, and an EPE-error heatmap with red MRC violation overlay.
Colab BYOM notebook — install → register your model → eval against LithoBench → submit to leaderboard, end-to-end.
Auto-Leaderboard CI — open a PR with your numbers, the workflow re-runs them on the standard test set, and merges if they verify.
Docs — Lithography for AI Engineers translates the field's vocabulary (Mask → Input Image, OPC → Image-to-Image inverse problem, PV-Band → Robustness Margin, MRC → Output Constraint Satisfaction).

Who this is for¶

ML researchers who want to publish ML-for-OPC results that other groups can actually reproduce.
Lithography engineers who want to evaluate ML approaches against the same metrics they care about in production (MRC compliance, EUV stochastic margin).
Foundry / IDM teams who want a vendor-neutral common ground when comparing internal tools.
Students entering the field who don't want to spend their first six months wiring up data loaders.

What's next¶

Synthetic layout generator — diffusion + rule-based against FreePDK45/ASAP7 to break the dataset-size ceiling.
Simulator hooks — pluggable adapters for Calibre nmOPC and Tachyon, so labs with commercial-tool access can use them as ground-truth oracles without forking the framework.
EUV 3D-mask + Monte Carlo stochastic eval — moving past thin-mask Hopkins toward what really matters at 3nm and below.
Pre-trained base models — MAE-style self-supervised pretraining on polygon rasters; user fine-tune in a few thousand samples.
Layout tokenization — making polygons first-class for transformer research.

The full roadmap lives at https://github.com/OpenLithoHub/OpenLithoHub/blob/main/CHANGELOG.md.

How to get involved¶

Try it: pip install openlithohub or open the Colab notebook.
Submit a model to the leaderboard — see docs/leaderboard-submission.md.
Discord is launching 2026-Q3. Until then, GitHub Issues and Discussions are the main forum. Watch the repo or open an issue with the community label to be notified.
Cite via CITATION.cff if you use it in a paper.

Paste-ready posts¶

The sections below are short adaptations of the announcement above for specific platforms. Copy the relevant block, fill in your own handle / links / images, and post.

X / Twitter (thread, 6 posts)¶

1/ Today we're releasing OpenLithoHub — an open, Apache-2.0 benchmarking
& workflow framework for computational lithography (OPC, ILT, EUV).

The pitch: bring your own ML model, plug it into one interface, and get
the same numbers everyone else gets.

🔗 github.com/OpenLithoHub/OpenLithoHub

2/ Why? ML-for-OPC has a reproducibility problem that nobody owns.

Datasets are scattered. "EPE" means three different things in three
different papers. Half the field compares against a Gaussian PSF that
isn't a real lithographic model. MRC is treated as cosmetic.

3/ What we ship in v0.1:

- Unified loaders: LithoBench, LithoSim, GAN-OPC (~4875 paired masks),
  ICCAD'16 hotspot
- A real differentiable Hopkins/SOCS forward model — dipole, quasar,
  defocus, all auto-grad
- MRC as a HARD gate. Bad mask → benchmark fail.

4/ Plus:
- Curvilinear MRC checks (curvature radius + min area)
- OASIS round-trip + Calibre/IC Validator runset templates
- Paper-ready vis (IEEE / SPIE column-width, Type-42 PDFs)
- HF Space playground with EPE error heatmaps

5/ For ML researchers: bring your own model in <50 lines. Colab BYOM
notebook walks you from install → eval → leaderboard submit:
colab.research.google.com/github/OpenLithoHub/OpenLithoHub/blob/main/notebooks/colab_byom.ipynb

6/ Roadmap: synthetic layout gen, Calibre/Tachyon simulator hooks, EUV
3D-mask + Monte Carlo stochastic eval, pretrained base models.

Star the repo, try the playground, open an issue with feedback.

huggingface.co/spaces/OpenLithoHub/playground

LinkedIn (one long post)¶

After ~6 months of work, we're open-sourcing OpenLithoHub today — a
benchmarking and workflow framework for computational lithography that
takes a strong stance on the reproducibility crisis in ML-for-OPC.

The state of the field: every paper rolls its own dataset loaders, "EPE"
means different things in different publications, half the work compares
against a Gaussian-PSF stand-in that no lithographer would call a
forward model, and MRC compliance is too often a footnote rather than a
hard gate.

OpenLithoHub fixes the foundation:

✅ Unified adapters for LithoBench, LithoSim, GAN-OPC, ICCAD'16 hotspot
✅ One canonical implementation per metric (EPE, PV-Band, shot count,
   EUV stochastic, hotspot detection)
✅ A real differentiable Hopkins/SOCS forward model — dipole, quasar,
   defocus — auto-grad end-to-end so it drops into AI-OPC training
✅ MRC/DRC as a hard gate, including curvilinear checks
✅ OASIS round-trip + Calibre/IC Validator runset templates
✅ Paper-ready visualization (IEEE / SPIE / vector PDF)
✅ HuggingFace Space playground with EPE error heatmaps
✅ Colab BYOM notebook: install → register model → eval → submit, in
   one runtime.

It's Apache-2.0. Bring your own model, plug it into one interface, and
publish numbers other groups can reproduce.

If you work on OPC, ILT, AI-EDA, or computational lithography — please
try it and tell us where it falls short.

GitHub: https://github.com/OpenLithoHub/OpenLithoHub
Playground: https://huggingface.co/spaces/OpenLithoHub/playground
Docs: https://docs.openlithohub.com

#computationallithography #EUV #OPC #ILT #MachineLearning #EDA
#OpenSource #SemiconductorManufacturing

知乎 (中文，长文版)¶

标题：OpenLithoHub v0.1 发布——给计算光刻领域一个可复现的开源底座

过去五年，ML-for-OPC / ML-for-ILT 的论文越来越多，但整个领域有一个谁都不
愿意正面承认的复现性问题：

- 数据集散落在不同 GitHub 仓库，每篇论文自己写 loader，半数有静默的坐标
  约定 bug；
- "EPE" 在一篇论文里是 fragment 端点测，在另一篇是沿弧长测，在第三篇是
  contour band 上的 RMS；
- 一半工作的"光刻仿真"是一个高斯 PSF 卷积，根本不是部分相干成像模型；
- MRC 经常被当作"画图修饰"，而不是真正决定 mask 能否进 fab 的硬约束。

我们今天开源 OpenLithoHub，对以上每一条都给出明确立场：

1. **一个统一的 DatasetAdapter 接口**，覆盖 LithoBench / LithoSim /
   GAN-OPC（约 4875 对掩膜）/ ICCAD'16 Problem C 热点检测。
2. **每个 metric 一个权威实现**。叫 EPE 就只能是 `compute_epe`；叫别的
   名字没问题，但别再混用。
3. **可微分 Hopkins/SOCS 前向模型**。SVD 截断、per-(params, grid) 缓存，
   支持 circular / annular / dipole / quasar 光源以及 defocus，端到端
   auto-grad，可以直接丢进 AI-OPC 训练 loop。
4. **MRC 是硬门槛**。Mask 违反 MRC 直接 benchmark 失败，EPE 再漂亮也
   没用。曲线形 mask 的 MRC（曲率半径 + 最小面积）也实现了。

v0.1 还包含：
- 5 个 baseline 模型（dummy-identity / rule-based-opc / levelset-ilt /
  openilt / neural-ilt），都遵循同一个 LithographyModel 接口；BYOM 50 行内搞定。
- OASIS round-trip + Calibre nmDRC / Synopsys IC Validator runset 模板。
- 论文级可视化（IEEE / SPIE 栏宽、矢量 PDF、Type-42 字体、色盲友好配色）。
- HuggingFace Space playground，自带 EPE 误差热力图 + 红色 MRC 违规高亮。
- Colab BYOM 一键运行 notebook。
- Auto-Leaderboard CI：PR 提交模型后，CI 自动在标准测试集复现一遍。

下一步路线图：
- 合成版图生成器（rule-based + 扩散模型，对接 FreePDK45 / ASAP7）；
- Calibre nmOPC / Tachyon 仿真器接口；
- EUV 3D 掩膜 + Monte Carlo 随机失效评估；
- 自监督预训练 base model（MAE 风格）；
- Layout tokenization for transformer research.

License Apache-2.0，欢迎 PR、Issue、提模型。
项目地址：https://github.com/OpenLithoHub/OpenLithoHub
Playground：https://huggingface.co/spaces/OpenLithoHub/playground
文档：https://docs.openlithohub.com

Hugging Face Forum (Show and Tell)¶

Title: OpenLithoHub — open benchmarking + Hopkins/SOCS forward model for
computational lithography (OPC, ILT, EUV)

Hey HF community,

We're shipping OpenLithoHub today — Apache-2.0, full ML-for-OPC stack
with HF Space playground, BYOM Colab, and a real differentiable
Hopkins/SOCS forward model.

For folks not in the lithography world: this is the "ML for chip mask
optimization" subfield. Imagine image-to-image inverse problems where
the input is your desired silicon pattern and the output is a mask
shape that, when imaged through a partial-coherent optical system and a
nonlinear resist, will print correctly. EPE is the ML-friendly word for
"how wrong is the printed contour vs. target."

What might interest this community specifically:

- Differentiable Hopkins/SOCS forward model — drop it into your training
  loop the same way you'd drop in a perceptual loss. Supports dipole /
  quasar / annular illumination + defocus.
- HF Space playground:
  https://huggingface.co/spaces/OpenLithoHub/playground
  3 presets (SRAM cell, contact array, random routing), upload your own
  GDS, get EPE heatmap + MRC overlay back.
- The model registry uses an interface compatible with HF Hub-style
  pretrained-weight downloads (`from_pretrained`-ish), making it easy
  to host model weights on HF.
- Colab BYOM notebook: install → register your `LithographyModel` →
  eval on LithoBench → submit to leaderboard, all in one runtime.
- 5 baselines included: identity, rule-based OPC, level-set ILT,
  OpenILT (SimpleILT / MOSAIC), neural-ILT (U-Net, with public v0.1
  seed weights on HuggingFace).

We'd love feedback on the BYOM interface from people who've built
analogous "bring your own model" pipelines on HF — the friction points
in our 50-LOC integration are exactly what we want to hear about.

GitHub: https://github.com/OpenLithoHub/OpenLithoHub
Docs: https://docs.openlithohub.com

Happy to answer questions in this thread.