Data Adapters¶
Authentication for gated datasets
Some adapters (currently LithoSim) load data from a gated
HuggingFace Hub repository. If a load() call raises a
RuntimeError mentioning HTTP 401, see
HuggingFace Authentication for the unblock
steps (request access → huggingface-cli login / HF_TOKEN).
openlithohub.data.base
¶
Abstract base class for dataset adapters.
LithoSample
dataclass
¶
A single lithography sample with unified tensor representation.
Source code in src/openlithohub/data/base.py
DatasetAdapter
¶
Bases: ABC
Abstract adapter for lithography datasets.
Subclasses must implement len and getitem to provide unified PyTorch Tensor access regardless of underlying format.
Source code in src/openlithohub/data/base.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |
supports_random_access
property
¶
Whether len() and integer indexing are well-defined.
Streaming adapters (e.g. LithoSimDataset(streaming=True)) lazily
consume an iterable and cannot answer len() or ds[i] without
materialising the whole stream — they raise :class:TypeError
instead. Callers that branch between batched evaluation (random
access) and online consumption (iteration only) should query this
property rather than catching TypeError after the fact.
Defaults to True; streaming adapters override to False.
download(root)
abstractmethod
¶
croissant_name()
¶
croissant_description()
¶
Free-text description for Croissant metadata.
Subclasses should override with a one-paragraph dataset summary.
croissant_license_url()
¶
croissant_citation()
¶
croissant_url()
¶
to_croissant()
¶
Emit MLCommons Croissant 1.0 JSON-LD metadata.
Croissant is the de-facto ML dataset metadata schema (HuggingFace,
Google, Kaggle, MLCommons; published 2024-12). Producing it from
DatasetAdapter lets downstream MLPerf-style benchmarks
consume our datasets without bespoke adapters.
The output is a Python dict matching the JSON-LD shape — caller
serialises it with json.dumps (default), or feeds it to a
Croissant validator. We emit the minimum compliant subset:
@context, @type, name, description, license,
url, citeAs, plus a single RecordSet describing the
sample shape (design / mask / resist tensors). Subclasses can
override hook methods (croissant_name / ..._description
/ ...) to enrich the output.
Source code in src/openlithohub/data/base.py
natural_sort_key(s)
¶
Sort key that orders strings with embedded numbers numerically.
sample_2 < sample_10 < sample_100, instead of the lexical
sample_10 < sample_2.
Source code in src/openlithohub/data/base.py
openlithohub.data.lithobench
¶
LithoBench dataset adapter (.npy format).
LithoBench (NeurIPS'23) organizes data as paired .npy arrays per sample: root/ design/ sample_0000.npy # binary design layout (H, W) sample_0001.npy ... mask/ sample_0000.npy # optimized mask (H, W), may not exist for all samples ... resist/ sample_0000.npy # simulated resist contour (H, W), optional ... metadata.json # optional: per-sample process parameters
Alternatively, a flat layout is supported: root/ sample_0000_design.npy sample_0000_mask.npy sample_0000_resist.npy ...
LithoBenchDataset
¶
Bases: DatasetAdapter
Adapter for the LithoBench dataset (NeurIPS'23, 45nm baseline).
Supports two directory layouts: 1. Subdirectory layout: root/{design,mask,resist}/sample_XXXX.npy 2. Flat layout: root/sample_XXXX_{design,mask,resist}.npy
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
root
|
str | Path
|
Path to the dataset directory. |
required |
split
|
str | None
|
Optional split name (e.g. 'train', 'test'). If set, looks for root/split/. |
None
|
pixel_nm
|
float
|
Pixel resolution in nanometers (default 1.0 for LithoBench 45nm node). |
1.0
|
Source code in src/openlithohub/data/lithobench.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 | |
has_kind(sample_id, kind)
¶
download(root, artifact='lithomodels.tar.gz')
¶
Download a pinned LithoBench artifact via gdown and verify its SHA-256.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
root
|
str
|
Destination directory. Created if missing. The tar is
streamed to |
required |
artifact
|
str
|
Canonical filename of the artifact to fetch. Must be
a key in :data: |
'lithomodels.tar.gz'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
|
ImportError
|
|
IntegrityError
|
bytes-on-disk don't match the pinned SHA-256. |
Source code in src/openlithohub/data/lithobench.py
openlithohub.data.lithosim
¶
LithoSim dataset adapter (HuggingFace Parquet format).
LithoSim is a sub-28nm industrial lithography simulation dataset hosted on HuggingFace Hub. It stores design/mask/resist image pairs as Parquet rows with image columns and process metadata.
The upstream dataset (OpenLithoHub/LithoSim) is currently gated:
new users must request access on the Hub and authenticate with
huggingface-cli login before this adapter can fetch data. Calls
without auth fail with HTTP 401; the adapter detects that and raises
:class:RuntimeError with the remediation steps.
Requires: pip install openlithohub[data] (adds datasets and pyarrow)
LithoSimDataset
¶
Bases: DatasetAdapter
Adapter for the LithoSim dataset (sub-28nm industrial benchmark).
Loads data from HuggingFace Hub using the datasets library.
Images are stored as columns in Parquet format and decoded to tensors on access.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
split
|
str
|
Dataset split ('train', 'test', or 'all'). |
'test'
|
dataset_name
|
str
|
HuggingFace dataset identifier. Override for custom forks. |
_HF_DATASET_NAME
|
cache_dir
|
str | None
|
Local cache directory for downloaded data. |
None
|
pixel_nm
|
float
|
Pixel resolution in nanometers. |
0.5
|
streaming
|
bool
|
If True, use streaming mode (no full download). |
False
|
revision
|
str | None
|
Optional Git revision (commit SHA, tag, or branch) to pin
for reproducible downloads. Defaults to |
_DEFAULT_REVISION
|
Source code in src/openlithohub/data/lithosim.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 | |
openlithohub.data.iccad16
¶
ICCAD 2016 Problem C — EUV hotspot detection benchmark adapter.
The benchmark is from the ICCAD 2016 CAD Contest (Problem C, EUV Simulation). The publicly mirrored copy lives at https://github.com/phdyang007/ICCAD16-N7M2EUV — four EUV designs at N7 / 16 nm CD plus simulated hotspot locations recorded under a process-window sweep.
The dataset is a hotspot detection benchmark, not a mask optimization benchmark — there is no OPC reference mask to compare against. Two pieces of evidence:
- The repo's references are both hotspot-detection papers (Chen et al., DAC'19; Yang et al., TCAD'20 — "Bridging the Gap Between Layout Pattern Sampling and Hotspot Detection via Batch Active Learning").
- The auxiliary layer
(10000, 0)ships 120 small 16×16 nm boxes distributed on a regular grid, covering only ~1% of design pixels and located 70+ nm away from any CSV hotspot — consistent with detection clip / inspection-grid sites, not with an OPC mask.
Files per test case:
testcaseN.oas— OASIS layout. The N7M2EUV stack is documented in [Yang2020_BatchAL, §III-A, p.4]; the per-layer mapping below applies to every test case in this distribution:
| GDS layer | Datatype | Meaning |
|---|---|---|
| 1000 | 0 | Design polygons (drawn metal-2 features at N7, 16 nm CD). |
| 10000 | 0 | Auxiliary clip-site grid (16×16 nm boxes, hotspot inspection sites). |
(layer=1000, datatype=0) is exposed as the loaded design
tensor. (layer=10000, datatype=0) is exposed under
metadata['clip_sites'].
- testN.csv — hotspot annotations with columns def, id,
category, x, y. Coordinates are in OASIS database units (1 dbu =
1 nm for these files); category is the contest's defect type
code (raw integers, per-testcase). The README promises three
semantic kinds (EPE / Bridging / Necking) but does not publish the
integer mapping, so the code preserves the raw id. The same physical
site can appear multiple times under different dose/focus
conditions.
The adapter returns LithoSample(design, mask=None, resist=None,
metadata). LithoSample.mask is intentionally left None —
this dataset does not provide a reference mask. Hotspot annotations
and clip-site centers live in metadata.
HotspotAnnotation
dataclass
¶
One row from the testN.csv hotspot table.
x_nm / y_nm are the contest dbu coordinates converted to nm
using the OASIS layout's dbu (1 dbu = 1 nm for the published files,
but the conversion still goes through layout.dbu * 1000).
category_id preserves the raw contest code; the README only
promises three semantic kinds (EPE / Bridging / Necking) but does
not publish the integer mapping, so callers should treat the id as
an opaque label until they have the contest's category dictionary.
Source code in src/openlithohub/data/iccad16.py
Iccad16Dataset
¶
Bases: DatasetAdapter
Adapter for the ICCAD 2016 Problem C — EUV hotspot benchmark.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
root
|
str | Path
|
Directory containing |
required |
cases
|
list[int] | None
|
Optional explicit list of case indices to expose, e.g.
|
None
|
design_layer
|
tuple[int, int]
|
|
(1000, 0)
|
clip_layer
|
tuple[int, int]
|
|
(10000, 0)
|
pixel_nm
|
float
|
Raster pixel size in nm. The published layouts are ~1.9 µm × 1.5 µm so 1 nm/px stays well under 2k×2k. |
1.0
|
The adapter reads each OASIS file lazily on first access and caches
the rasterized design tensor in memory. klayout is required and
is already pinned in pyproject.toml.
Source code in src/openlithohub/data/iccad16.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 | |
openlithohub.data.ganopc
¶
GAN-OPC training-data adapter.
GAN-OPC ships its training set as ~4875 paired binary PNGs at 2048×2048
resolution. The public mirror is https://github.com/phdyang007/GAN-OPC,
distributed as a 30-volume 7z archive (ganopc-data.7z.001 … .030).
The :func:download_ganopc helper auto-fetches and unpacks it on first
use; until then the repo carries no upstream bytes (per
DATA-LICENSES.md — redistribution is not granted).
Reference: Yang et al., GAN-OPC: Mask Optimization with Lithography-guided Generative Adversarial Nets, DAC 2018 (doi:10.1145/3195970.3196056). A paywalled TCAD 2020 extension exists; the open DAC paper is the canonical citation for this adapter.
Once unpacked, the directory layout is::
ganopc-data/
artitgt/
1.glp.png # target design layout (binary)
2.glp.png
...
map.txt # filename index (ignored by this loader)
artimsk/
1.glpOPC.png # OPC-output mask paired with the target
2.glpOPC.png
...
The two trees share sample IDs verbatim (N.glp.png ↔
N.glpOPC.png). Pixel pitch is not stored alongside the data; the
loader defaults to 1.0 nm/px (configurable). Both PNGs are 8-bit
grayscale with strictly {0, 255} content, which the loader thresholds
into a {0., 1.} float32 tensor.
GanOpcDataset
¶
Bases: DatasetAdapter
Adapter for the GAN-OPC paired-PNG training set.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
root
|
str | Path
|
Either the directory containing |
required |
sample_ids
|
list[str] | None
|
Optional explicit list of sample IDs to expose
(e.g. |
None
|
pixel_nm
|
float
|
Raster pixel size in nm. Defaults to 1.0; this is the convention OpenLithoHub uses elsewhere and matches the ~2 µm patch sizes typical of GAN-OPC layouts. Override via constructor if your downstream pipeline assumes a different scale. |
1.0
|
threshold
|
int
|
Grayscale cutoff (0–255) above which a pixel is considered "on". Defaults to 127. The published PNGs are already strict binary, so the threshold only matters if a user supplies non-canonical files. |
127
|
Source code in src/openlithohub/data/ganopc.py
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 | |
download(root)
¶
Fetch the GAN-OPC training set from upstream on first use.
Mirrors the pattern in :class:LithoBenchDataset.download: clones
the upstream repository, joins the 30-volume 7z archive, and
extracts the resulting tree so that
<root>/ganopc-data/{artitgt,artimsk}/ is populated.
Idempotent: if <root>/ganopc-data/artitgt already exists the
call is a no-op. The intermediate clone and joined archive are
kept on disk so a partial extraction can resume without re-cloning.
Requires git on PATH and the py7zr +
multivolumefile Python packages (declared optional under
the data extras).
Source code in src/openlithohub/data/ganopc.py
download_ganopc(root, *, revision=_DEFAULT_REVISION, repo_url=_UPSTREAM_REPO)
¶
Clone GAN-OPC and extract the multi-volume 7z into <root>/ganopc-data.
Idempotent: if <root>/ganopc-data/artitgt already exists the
fetch is a no-op and the existing path is returned. Otherwise the
upstream repo is shallow-cloned into <root>/.ganopc-src, the 30
archive volumes (ganopc-data.7z.001 … .030) are joined via
:mod:multivolumefile, and the resulting tree is extracted in place
via :mod:py7zr.
Returns the path to the extracted ganopc-data directory.
Raises:
| Type | Description |
|---|---|
ImportError
|
|
FileNotFoundError
|
|
RuntimeError
|
the upstream layout no longer matches the expected
|
Source code in src/openlithohub/data/ganopc.py
214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 | |
openlithohub.data.asap7
¶
ASAP7 predictive PDK adapter (standard cells from a single GDS).
ASAP7 (Clark et al., ASU) ships its standard-cell library as a single GDSII
file containing every cell as a top-level cell in the layout — there is no
per-cell file to read. The canonical release lives at
https://github.com/The-OpenROAD-Project/asap7 under BSD-3-Clause; the
7.5-track regular-Vt cells are in submodule asap7sc7p5t_27 at
GDS/asap7sc7p5t_27_R_*.gds.
This adapter loads a small canonical list of cells (INVx1, NAND2x1,
NOR2x1, DFFHQNx1) by name and rasterizes one design layer per cell into a
LithoSample.design tensor. The layer choice is configurable; the
default is M1 (10/0), which is the densest mask layer foundry reviewers
ask about first. The cell selection is intentionally narrow — Phase 1 of
issue #4 is a smoke-test, not a full library benchmark.
Per DATA-LICENSES.md, this adapter does not redistribute any PDK
bytes. Users must clone the upstream repository themselves and pass the
local path. download() is a guarded helper that git clones the
upstream repo only after the caller passes accept_license=True,
acknowledging the BSD-3-Clause attribution requirement.
Asap7Dataset
¶
Bases: DatasetAdapter
Adapter for the ASAP7 predictive PDK standard cells.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
root
|
str | Path
|
Path to a local clone of |
required |
cells
|
tuple[str, ...] | list[str] | None
|
Cell names to expose, in order. Defaults to
|
None
|
design_layer
|
tuple[int, int]
|
|
DEFAULT_DESIGN_LAYER
|
pixel_nm
|
float
|
Raster pixel size in nm. Defaults to 1.0 to match the existing OpenLithoHub grid; ASAP7's manufacturing dbu is 0.25 nm so this is a 4× downsample. |
1.0
|
gds_path
|
str | Path | None
|
Optional explicit override for the GDS file path. If
unset, the adapter globs |
None
|
resolve_shorthand
|
bool
|
When True (default), attempt to expand
function-name shorthand into the canonical ASAP7 cell-name
(drive=x1, flavor=R, track=75) before raising KeyError.
|
True
|
The adapter requires klayout (already pinned in pyproject.toml).
Source code in src/openlithohub/data/asap7.py
232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 | |
download(root)
¶
Clone ASAP7 to root. Always rejected — use fetch() instead.
The base DatasetAdapter.download signature has no place for the
license-acknowledgement flag this PDK requires, so this method is a
guard that points the caller at Asap7Dataset.fetch().
Source code in src/openlithohub/data/asap7.py
fetch(root, accept_license=False)
classmethod
¶
Clone the ASAP7 repo with submodules to root.
ASAP7 ships under BSD-3-Clause. The license requires attribution
in any redistribution; accept_license=True is the caller's
explicit acknowledgement that they have read the license at
ASAP7_LICENSE_URL and will comply with the attribution
requirement when sharing derived layouts.
Per DATA-LICENSES.md, OpenLithoHub does not redistribute PDK
bytes — this method only clones from the official upstream
source on the user's own machine.
Source code in src/openlithohub/data/asap7.py
resolve_cell_name(shorthand, *, drive='x1', flavor='R', track='75')
¶
Expand a function-name shorthand into the ASAP7 canonical cell name.
The ASAP7 stdcell library names every cell as
<FUNC><DRIVE>_ASAP7_<TRACK>t_<FLAVOR>. Issue spec language often
uses the bare function name ("INV", "NAND2", "DFFHQN");
this helper composes the canonical string a downstream reader
actually expects, with sensible defaults for drive / flavor / track.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
shorthand
|
str
|
Function name, with or without trailing |
required |
drive
|
str
|
Drive-strength suffix ( |
'x1'
|
flavor
|
str
|
|
'R'
|
track
|
str
|
|
'75'
|
Returns:
| Type | Description |
|---|---|
str
|
The canonical cell-name string, e.g. |
Raises:
| Type | Description |
|---|---|
ValueError
|
For unknown flavor/track values. |
Examples:
>>> resolve_cell_name("INV")
'INVx1_ASAP7_75t_R'
>>> resolve_cell_name("NAND2", drive="x2", flavor="L")
'NAND2x2_ASAP7_75t_L'
>>> resolve_cell_name("INVx1_ASAP7_75t_R") # passthrough
'INVx1_ASAP7_75t_R'
Source code in src/openlithohub/data/asap7.py
rasterize_cell_layer(layout, cell, layer_spec, pixel_nm)
¶
Rasterize one (layer, datatype) of a klayout cell into a {0,1} array.
Polygons are rasterized through PIL.ImageDraw.polygon after
transforming their hull/holes to pixel coordinates. This is faithful
for arbitrary (Manhattan and non-Manhattan) shapes — earlier code
decomposed into trapezoids and filled their bboxes, which over-filled
angled trapezoids and only happened to be exact because ASAP7 is
Manhattan-only. Sibling PDK adapters reuse this helper, so the fix
has to handle non-axis-aligned geometry too.
Iterates the layer recursively (begin_shapes_rec) so geometry
referenced via cell instances (a stdcell that INSTANCEs a shared
via array, for example) is included; the previous flat
cell.shapes(...).each() silently dropped instanced shapes.
Returns (array, origin_nm) where origin_nm is the cell bbox
lower-left corner in nm. The returned array follows the same
orientation convention as :func:openlithohub.data.io.load_layout:
image (y-down) coordinates with arr[0] at the top of the layout
viewer (largest y_nm). Earlier asap7 code stored y-up and then
flipud-d, contradicting the canonical convention and producing
vertically mirrored masks vs. load_layout-loaded layouts.
Source code in src/openlithohub/data/asap7.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 | |
openlithohub.data.freepdk45
¶
FreePDK45 + NanGate Open Cell Library adapter (single-GDS standard cells).
FreePDK45 is NCSU's 45nm open-source predictive PDK; NanGate's Open Cell
Library provides the standard cells designed against it. The mflowgen
ASIC design kit at https://github.com/mflowgen/freepdk-45nm bundles the
two together as a convenience drop, including a single stdcells.gds
file with all 135 NanGate cells.
This adapter loads a small canonical list of cells (INV_X1, NAND2_X1,
NOR2_X1, DFF_X1) by name and rasterizes one design layer per cell. The
default is metal1 = (11, 0) per the kit's rtk-stream-out.map (note:
this is not the same numbering as ASAP7, where metal1 = (10, 0)).
License caveat¶
Unlike ASAP7's clean BSD-3-Clause, the FreePDK45 distribution is two licenses stacked:
- FreePDK45 (NCSU): see https://eda.ncsu.edu/freepdk/freepdk45/.
- NanGate Open Cell Library (Si2): see https://si2.org/open-cell-library/.
The mflowgen mirror at github.com/mflowgen/freepdk-45nm does not
ship a top-level LICENSE file, so callers MUST verify the upstream
terms themselves before redistributing any derivative work. As with
ASAP7, fetch() requires explicit accept_license=True to
acknowledge this responsibility, and the adapter never bundles PDK
bytes into the OpenLithoHub repository.
FreePdk45Dataset
¶
Bases: DatasetAdapter
Adapter for FreePDK45 + NanGate standard cells via mflowgen mirror.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
root
|
str | Path
|
Path to a local clone of |
required |
cells
|
tuple[str, ...] | list[str] | None
|
Cell names to expose, in order. Defaults to
|
None
|
design_layer
|
tuple[int, int]
|
|
DEFAULT_DESIGN_LAYER
|
pixel_nm
|
float
|
Raster pixel size in nm. Defaults to 1.0; the FreePDK45 dbu is 0.1 nm so this is a 10× downsample. |
1.0
|
gds_path
|
str | Path | None
|
Optional explicit override for the GDS file path. If
unset, the adapter looks for |
None
|
The adapter requires klayout (already pinned in pyproject.toml).
Source code in src/openlithohub/data/freepdk45.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 | |
download(root)
¶
Always rejected — use fetch() instead.
The base DatasetAdapter.download signature has no place for
the license-acknowledgement flag this PDK requires.
Source code in src/openlithohub/data/freepdk45.py
fetch(root, accept_license=False)
classmethod
¶
Clone the mflowgen FreePDK45 mirror to root.
FreePDK45 + NanGate ships under a stacked license that the
mflowgen mirror does not declare in a LICENSE file. Callers
must independently verify both upstream terms before
redistributing any derivative work, and the adapter requires
accept_license=True to acknowledge that responsibility.
Per DATA-LICENSES.md, OpenLithoHub does not redistribute PDK
bytes — this method only clones from the mflowgen mirror on the
user's own machine.
Source code in src/openlithohub/data/freepdk45.py
openlithohub.data.freepdk45_sram
¶
FreePDK45 SRAM bitcell adapter — load OpenRAM's bundled GDS as samples.
OpenRAM (BSD-3-Clause, pip install openram) ships a small set of
hand-crafted FreePDK45 standard cells under technology/freepdk45/gds_lib/:
cell_1rw.gds— 6T 1-port SRAM bitcellcell_2rw.gds— 8T dual-port SRAM bitcelldff.gds— D-flip-flopsense_amp.gds— sense amplifierwrite_driver.gds— write drivertri_gate.gds— tri-state gatereplica_cell_{1,2}rw.gds— timing-replica columnsdummy_cell_{1,2}rw.gds— row/column edge dummies
These are the exact cells OpenRAM compiles together to build a full SRAM macro on FreePDK45. Each GDS contains a single top cell whose name matches the file stem.
This adapter rasterizes one design layer per cell (default: metal1
(11, 0)) and emits one LithoSample per cell — directly addressing
issue #4 Phase 3's SRAM-bitcell-tile data goal without running OpenRAM's
compiler. The compile path (sram_compiler.py) currently has a
numpy-2 scalar-conversion regression in upstream OpenRAM 1.2.48; the
pre-shipped GDS files are the canonical, citation-worthy artifact and
sidestep that bug entirely.
Layer numbering matches the FreePDK45 stream-out map (layers.map in
the openram package): metal1 = 11/0, identical to the mflowgen NanGate
mirror, so the central registry's LAYERS["freepdk45"].metal1 covers
both.
License¶
- OpenRAM: BSD-3-Clause (https://github.com/VLSIDA/OpenRAM)
- FreePDK45: academic / non-commercial (NCSU EDA Wiki).
The adapter does not redistribute either set of bytes — it locates the
bundled GDS via importlib.resources.files("openram") at runtime, so
the user's pip-installed openram wheel is the source of truth.
FreePdk45SramDataset
¶
Bases: DatasetAdapter
Adapter for OpenRAM's bundled FreePDK45 SRAM-cell GDS files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cells
|
Sequence[str] | None
|
Cell names to expose, in order. Defaults to
|
None
|
design_layer
|
tuple[int, int]
|
|
DEFAULT_DESIGN_LAYER
|
pixel_nm
|
float
|
Raster pixel size in nm. Defaults to 1.0; FreePDK45's dbu is 0.5 nm so this is a 2× downsample. |
1.0
|
gds_lib_path
|
str | Path | None
|
Optional explicit path to OpenRAM's |
None
|
Each LithoSample has mask=None and resist=None — these
are unmasked design-layer rasterizations, suitable as inputs to OPC
/ mask-optimization research, not paired training data.
Source code in src/openlithohub/data/freepdk45_sram.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 | |
download(root)
¶
No-op — the GDS bundle ships in the openram pip wheel.
Install via pip install 'openlithohub[freepdk45-sram]' or
pip install openram; the adapter then locates the bundle
automatically via importlib.resources.
Source code in src/openlithohub/data/freepdk45_sram.py
openlithohub.data.orfs
¶
ORFS artifact adapter — load ASAP7-routed RISC-V layouts as tile samples.
OpenROAD-flow-scripts (ORFS) is the open-source RTL→GDSII flow. Its
flow/designs/asap7/<name>/ configurations produce real ASAP7-routed
layouts — including mock-alu, riscv32i, riscv32i-mock-sram
(the SRAM-instantiated variant covering Phase 3's SRAM-bitcell-tile
goal), ibex, swerv_wrapper, and cva6 — under
flow/results/asap7/<name>/base/<name>.gds.
Phase 3 of issue #4 wires those artifacts into OpenLithoHub. The
adapter is fully generic over design name; the
build-asap7-mock-alu.yml workflow already accepts design as a
workflow_dispatch input, so producing a different design's GDS is
gh workflow run build-asap7-mock-alu.yml -f design=riscv32i-mock-sram
— no adapter or workflow code change needed.
The adapter rasterizes one design layer of the top cell, then cuts the
result into fixed-size tiles (2 µm or 5 µm by default — the windows
AI-OPC inference is benchmarked on). One LithoSample per tile.
Why tiling instead of one sample per block: a routed RISC-V ALU block is hundreds of microns on a side, far too large for the Hopkins forward model to evaluate as a single tensor. The ICCAD/AI-OPC literature evaluates on ~2 µm and ~5 µm windows, and that's what the issue spec (Phase 3) calls for.
License¶
ORFS itself is BSD-3-Clause; the asap7 platform underneath is also
BSD-3-Clause (same upstream as openlithohub.data.asap7). The
adapter re-uses the ASAP7 license constants — there is no separate
ORFS data-license gate beyond the ASAP7 acknowledgement already
required when fetching the PDK.
This module never redistributes ORFS or ASAP7 bytes. The fetch()
classmethod points at the build-asap7-mock-alu GitHub Actions
workflow that produces the GDS as a release-style artifact.
OrfsArtifactDataset
¶
Bases: DatasetAdapter
Load an ORFS-produced ASAP7 layout, expose it as N tile samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gds_path
|
str | Path
|
Path to a GDS file produced by |
required |
cell_name
|
str | None
|
Optional explicit top-cell name. Defaults to the GDS file's basename (matches ORFS naming convention). |
None
|
design_layer
|
tuple[int, int]
|
|
DEFAULT_DESIGN_LAYER
|
pixel_nm
|
float
|
Raster pixel size in nm. Default 1.0; ASAP7 dbu is 0.25 nm so the rasterizer downsamples 4×. |
1.0
|
tile_nm
|
float | None
|
Tile edge length in nm. Default 2000 (2 µm); also
commonly 5000 (5 µm). Pass |
DEFAULT_TILE_NM
|
stride_nm
|
float | None
|
Tile stride. Defaults to |
None
|
drop_empty_tiles
|
bool
|
Skip all-zero tiles. Default True. |
True
|
design_name
|
str | None
|
Optional human-readable design name for metadata
(e.g. "mock-alu", "riscv32i", "riscv32i-mock-sram"). Defaults
to |
None
|
orfs_revision
|
str | None
|
Optional ORFS git SHA recorded in metadata for
reproducibility. Set this to the |
None
|
Source code in src/openlithohub/data/orfs.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 | |
download(root)
¶
ORFS artifacts are produced by a CI workflow, not downloaded.
See .github/workflows/build-asap7-mock-alu.yml — trigger it
via gh workflow run build-asap7-mock-alu.yml and download
the resulting GDS artifact. There is no remote URL to fetch.
Source code in src/openlithohub/data/orfs.py
tile_design_tensor(design, tile_nm, pixel_nm, stride_nm=None, drop_empty=True)
¶
Cut a rasterized design into fixed-size tiles.
Returns [(tile_array, (x_pixels, y_pixels)), ...] where the
second element is the tile's lower-left corner in pixel
coordinates of the parent design. Tiles smaller than the requested
size at the right/top edges are dropped — keeping ragged tiles
would force the eval harness to handle variable-size inputs.
stride_nm defaults to tile_nm (non-overlapping grid).
drop_empty=True skips all-zero tiles. Routed layouts have huge
empty regions outside the core; emitting thousands of zero tiles
would dominate runtime without producing useful metrics.
Source code in src/openlithohub/data/orfs.py
openlithohub.data.transforms
¶
Data transforms for resolution alignment and normalization.
align_resolution(tensor, source_pixel_nm, target_pixel_nm, mode='bilinear', *, binarize=False, binarize_threshold=0.5)
¶
Resample a tensor to match target pixel resolution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
Input tensor (H, W), (C, H, W), or (N, C, H, W). |
required |
source_pixel_nm
|
float
|
Current pixel size in nanometers. |
required |
target_pixel_nm
|
float
|
Desired pixel size in nanometers. |
required |
mode
|
str
|
Interpolation mode ('bilinear', 'nearest', 'bicubic'). |
'bilinear'
|
binarize
|
bool
|
If True, threshold the resampled output back to {0, 1}
with |
False
|
binarize_threshold
|
float
|
Cutoff used when |
0.5
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Resampled tensor at the target resolution; ndim matches input. |
Notes
Output spatial dimensions are computed as
round(H * source / target) and passed to F.interpolate
via size=. The earlier scale_factor= form left the exact
output size to the framework's rounding policy, which differs
between PyTorch versions and between modes — explicit size
keeps a (1024, 1024) layout aligning to a (2048, 2048) grid at
2× upsample regardless of build.
Source code in src/openlithohub/data/transforms.py
normalize_to_binary(tensor, threshold=0.5)
¶
openlithohub.data.dummy
¶
Procedural dummy layout generator for CI, debugging, and onboarding.
These layouts are not representative of real cell libraries — they exist so that you can exercise the OpenLithoHub pipeline end-to-end without downloading LithoBench or LithoSim, and so CI can run hermetically without network or large data fixtures.
The generator only uses numpy and torch. It does not depend on KLayout
or any of the heavy [workflow] extras, which keeps it usable in Colab and
on minimal CI images.
DummyLayoutSpec
dataclass
¶
Parameters controlling the generated layout.
Source code in src/openlithohub/data/dummy.py
generate_dummy_layout(spec=None, *, size=None, seed=None)
¶
Generate a deterministic dummy binary layout that satisfies basic DRC.
The result is a 2D torch.Tensor of shape (size, size) with values in
{0.0, 1.0}. Polygons are placed by random rectangle splatting and then
cleaned with morphological opening/closing so that minimum width and
spacing rules are met for the configured pixel pitch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
DummyLayoutSpec | None
|
Full configuration; if omitted, a default 256 px / 40 nm spec is used and overridden by the keyword arguments. |
None
|
size
|
int | None
|
Convenience override for |
None
|
seed
|
int | None
|
Convenience override for |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Binary mask tensor of shape (size, size). |
Examples:
Source code in src/openlithohub/data/dummy.py
generate_dummy_pair(spec=None, **kwargs)
¶
Generate a (design, mask) pair where the mask is a dilated design.
Useful for sanity-checking OPC pipelines without real ground truth.