Data Adapters¶
openlithohub.data.base
¶
Abstract base class for dataset adapters.
LithoSample
dataclass
¶
A single lithography sample with unified tensor representation.
Source code in src/openlithohub/data/base.py
DatasetAdapter
¶
Bases: ABC
Abstract adapter for lithography datasets.
Subclasses must implement len and getitem to provide unified PyTorch Tensor access regardless of underlying format.
Source code in src/openlithohub/data/base.py
openlithohub.data.lithobench
¶
LithoBench dataset adapter (.npy format).
LithoBench (NeurIPS'23) organizes data as paired .npy arrays per sample: root/ design/ sample_0000.npy # binary design layout (H, W) sample_0001.npy ... mask/ sample_0000.npy # optimized mask (H, W), may not exist for all samples ... resist/ sample_0000.npy # simulated resist contour (H, W), optional ... metadata.json # optional: per-sample process parameters
Alternatively, a flat layout is supported: root/ sample_0000_design.npy sample_0000_mask.npy sample_0000_resist.npy ...
LithoBenchDataset
¶
Bases: DatasetAdapter
Adapter for the LithoBench dataset (NeurIPS'23, 45nm baseline).
Supports two directory layouts: 1. Subdirectory layout: root/{design,mask,resist}/sample_XXXX.npy 2. Flat layout: root/sample_XXXX_{design,mask,resist}.npy
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
root
|
str | Path
|
Path to the dataset directory. |
required |
split
|
str | None
|
Optional split name (e.g. 'train', 'test'). If set, looks for root/split/. |
None
|
pixel_nm
|
float
|
Pixel resolution in nanometers (default 1.0 for LithoBench 45nm node). |
1.0
|
Source code in src/openlithohub/data/lithobench.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | |
openlithohub.data.lithosim
¶
LithoSim dataset adapter (HuggingFace Parquet format).
LithoSim is a sub-28nm industrial lithography simulation dataset hosted on HuggingFace Hub. It stores design/mask/resist image pairs as Parquet rows with image columns and process metadata.
Requires: pip install openlithohub[data] (adds datasets and pyarrow)
LithoSimDataset
¶
Bases: DatasetAdapter
Adapter for the LithoSim dataset (sub-28nm industrial benchmark).
Loads data from HuggingFace Hub using the datasets library.
Images are stored as columns in Parquet format and decoded to tensors on access.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
split
|
str
|
Dataset split ('train', 'test', or 'all'). |
'test'
|
dataset_name
|
str
|
HuggingFace dataset identifier. Override for custom forks. |
_HF_DATASET_NAME
|
cache_dir
|
str | None
|
Local cache directory for downloaded data. |
None
|
pixel_nm
|
float
|
Pixel resolution in nanometers. |
0.5
|
streaming
|
bool
|
If True, use streaming mode (no full download). |
False
|
Source code in src/openlithohub/data/lithosim.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
openlithohub.data.transforms
¶
Data transforms for resolution alignment and normalization.
align_resolution(tensor, source_pixel_nm, target_pixel_nm, mode='bilinear')
¶
Resample a tensor to match target pixel resolution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
Input tensor (H, W) or (C, H, W). |
required |
source_pixel_nm
|
float
|
Current pixel size in nanometers. |
required |
target_pixel_nm
|
float
|
Desired pixel size in nanometers. |
required |
mode
|
str
|
Interpolation mode ('bilinear', 'nearest', 'bicubic'). |
'bilinear'
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Resampled tensor at the target resolution. |