Skip to content

Benchmark Metrics

Metrics

openlithohub.benchmark.metrics.epe

Edge Placement Error (EPE) computation.

Two flavors live here:

  • :func:compute_epe — mask-level. Compares predicted mask edges directly to target edges. An Identity model (mask passed straight through) scores 0 by construction, which is useful as a sanity baseline but does NOT reflect what would actually print on the wafer.
  • :func:compute_wafer_epe — wafer-level. Pushes the predicted mask through a forward optical/resist simulator and compares the resist contour to the target. This is the physically meaningful quantity for OPC quality: a square mask will round at the corners after diffraction, so an Identity model lands at a nonzero EPE.

Both report the same EPEResult schema; the leaderboard surfaces them under separate keys (epe_* vs epe_wafer_*) so existing dashboards that compare against historical mask-level numbers stay valid.

EPEResult

Bases: TypedDict

Per-sample EPE summary. Numeric fields are always float so callers can do arithmetic on them without first narrowing away bool.

Source code in src/openlithohub/benchmark/metrics/epe.py
class EPEResult(TypedDict):
    """Per-sample EPE summary. Numeric fields are always ``float`` so callers
    can do arithmetic on them without first narrowing away ``bool``."""

    epe_mean_nm: float
    epe_max_nm: float
    epe_std_nm: float
    valid: bool

compute_epe(predicted, target, pixel_size_nm=1.0)

Compute Edge Placement Error between predicted and target contours.

Symmetric edge-distance: for every edge pixel in both sets we compute the minimum distance to the other set, then aggregate over the union. The asymmetric form (predicted→target only) reports zero error for "missing entirely" failure modes — if predicted has no edges where target has a feature, predicted's edge set is empty and the loop has nothing to penalize. The symmetric form catches under-printing.

Parameters:

Name Type Description Default
predicted Tensor

Binary mask of predicted pattern (H, W), values in {0, 1}.

required
target Tensor

Binary mask of target/reference pattern (H, W), values in {0, 1}.

required
pixel_size_nm float

Physical size of each pixel in nanometers.

1.0

Returns:

Type Description
EPEResult

Dictionary with keys epe_mean_nm, epe_max_nm, epe_std_nm,

EPEResult

and valid. Empty-edge cases are reported explicitly:

EPEResult
  • both edge sets empty → all zeros, valid=True (degenerate match).
EPEResult
  • exactly one edge set empty → all values inf and valid=False; callers must not treat the result as a "perfect" score.
EPEResult
  • exactly one matched edge pixel → epe_std_nm is nan (std over a single sample is undefined); valid=True.
Source code in src/openlithohub/benchmark/metrics/epe.py
def compute_epe(
    predicted: torch.Tensor,
    target: torch.Tensor,
    pixel_size_nm: float = 1.0,
) -> EPEResult:
    """Compute Edge Placement Error between predicted and target contours.

    Symmetric edge-distance: for every edge pixel in *both* sets we compute
    the minimum distance to the *other* set, then aggregate over the
    union. The asymmetric form (predicted→target only) reports zero error
    for "missing entirely" failure modes — if predicted has no edges where
    target has a feature, predicted's edge set is empty and the loop has
    nothing to penalize. The symmetric form catches under-printing.

    Args:
        predicted: Binary mask of predicted pattern (H, W), values in {0, 1}.
        target: Binary mask of target/reference pattern (H, W), values in {0, 1}.
        pixel_size_nm: Physical size of each pixel in nanometers.

    Returns:
        Dictionary with keys ``epe_mean_nm``, ``epe_max_nm``, ``epe_std_nm``,
        and ``valid``. Empty-edge cases are reported explicitly:

        - both edge sets empty → all zeros, ``valid=True`` (degenerate match).
        - exactly one edge set empty → all values ``inf`` and ``valid=False``;
            callers must not treat the result as a "perfect" score.
        - exactly one matched edge pixel → ``epe_std_nm`` is ``nan`` (std over
            a single sample is undefined); ``valid=True``.
    """
    if predicted.shape != target.shape:
        raise ValueError(f"Shape mismatch: predicted {predicted.shape} vs target {target.shape}")

    pred_edges = _extract_edges(predicted)
    tgt_edges = _extract_edges(target)

    pred_pts = pred_edges.nonzero(as_tuple=False).float()
    tgt_pts = tgt_edges.nonzero(as_tuple=False).float()

    pred_empty = pred_pts.numel() == 0
    tgt_empty = tgt_pts.numel() == 0
    if pred_empty and tgt_empty:
        return {"epe_mean_nm": 0.0, "epe_max_nm": 0.0, "epe_std_nm": 0.0, "valid": True}
    if pred_empty or tgt_empty:
        inf = float("inf")
        # Std is undefined when one edge set is empty — return nan rather
        # than 0.0 so callers can distinguish "no data" from a real zero
        # spread, matching the single-edge-pixel convention below.
        return {"epe_mean_nm": inf, "epe_max_nm": inf, "epe_std_nm": float("nan"), "valid": False}

    pred_to_tgt = _min_pairwise_distances(pred_pts, tgt_pts)
    tgt_to_pred = _min_pairwise_distances(tgt_pts, pred_pts)
    min_distances = torch.cat([pred_to_tgt, tgt_to_pred]) * pixel_size_nm

    return {
        "epe_mean_nm": float(min_distances.mean().item()),
        "epe_max_nm": float(min_distances.max().item()),
        # std over a single edge pixel is undefined, not zero — return nan so
        # downstream filters can distinguish a degenerate single-edge result
        # from a genuine zero-spread multi-edge match.
        "epe_std_nm": (
            float(min_distances.std().item()) if min_distances.numel() > 1 else float("nan")
        ),
        "valid": True,
    }

compute_wafer_epe(predicted_mask, target, pixel_size_nm=1.0, simulator=None)

Compute EPE between the printed wafer contour and the target.

Pushes predicted_mask through a forward optical/resist simulator and compares the resulting binarised resist image to target using the same edge-distance routine as :func:compute_epe. This is the physically meaningful EPE for OPC quality — an Identity model (mask returned unchanged) lands at a nonzero value here because diffraction rounds corners that the original mask had as right angles.

Parameters:

Name Type Description Default
predicted_mask Tensor

Predicted mask (H, W), values in [0, 1]. The simulator will be applied to this tensor.

required
target Tensor

Target wafer/contour pattern (H, W), values in {0, 1}.

required
pixel_size_nm float

Physical pixel size in nanometers.

1.0
simulator BaseSimulator | None

Forward simulator. Defaults to a fresh :class:HopkinsSimulator with default config — callers that need specific dose / threshold / illumination should pass an explicit instance to keep results comparable across runs.

None

Returns:

Type Description
EPEResult

Same EPEResult schema as :func:compute_epe.

Source code in src/openlithohub/benchmark/metrics/epe.py
def compute_wafer_epe(
    predicted_mask: torch.Tensor,
    target: torch.Tensor,
    pixel_size_nm: float = 1.0,
    simulator: BaseSimulator | None = None,
) -> EPEResult:
    """Compute EPE between the *printed wafer contour* and the target.

    Pushes ``predicted_mask`` through a forward optical/resist simulator
    and compares the resulting binarised resist image to ``target`` using
    the same edge-distance routine as :func:`compute_epe`. This is the
    physically meaningful EPE for OPC quality — an Identity model (mask
    returned unchanged) lands at a nonzero value here because diffraction
    rounds corners that the original mask had as right angles.

    Args:
        predicted_mask: Predicted mask (H, W), values in [0, 1]. The
            simulator will be applied to this tensor.
        target: Target wafer/contour pattern (H, W), values in {0, 1}.
        pixel_size_nm: Physical pixel size in nanometers.
        simulator: Forward simulator. Defaults to a fresh
            :class:`HopkinsSimulator` with default config — callers that
            need specific dose / threshold / illumination should pass an
            explicit instance to keep results comparable across runs.

    Returns:
        Same ``EPEResult`` schema as :func:`compute_epe`.
    """
    if simulator is None:
        # Local import: simulators package pulls in heavy SOCS kernel state,
        # so we don't want benchmark.metrics.epe to drag it in at import time.
        from openlithohub.simulators.hopkins_sim import HopkinsSimulator

        simulator = HopkinsSimulator()

    sim_result = simulator.simulate(predicted_mask)
    # Prefer the binarised resist contour the simulator already produced.
    # Fall back to thresholding the aerial image at the configured threshold
    # for backends that only return aerial intensity.
    if sim_result.resist is not None:
        wafer = sim_result.resist
    else:
        threshold = simulator.config.threshold * simulator.config.dose
        wafer = (sim_result.aerial >= threshold).to(sim_result.aerial.dtype)

    return compute_epe(wafer, target, pixel_size_nm=pixel_size_nm)

openlithohub.benchmark.metrics.l2_error

L2 wafer error — Neural-ILT canonical mask-printability metric.

The standard academic OPC scoring contract, as established by Neural-ILT (ICCAD'20) and used by GAN-OPC / MOSAIC, is:

wafer = lithosim(mask, dose=1.0, threshold=0.225)
score = (wafer - target).abs().sum()       # L1 / SAD pixel count

i.e. forward-simulate the predicted mask through SOCS optics and the resist threshold, then count the pixel-wise differences against the target layout (not against the input mask). The result is in pixel units; multiply by pixel_size_nm**2 for an area in nm² if needed.

Naming note: the published Neural-ILT paper calls this scalar "L2 error". On the binary {0, 1} wafer/target images the formula emits, the squared-L2 norm (w - t).square().sum() and the L1 norm (w - t).abs().sum() produce the same integer (since x ∈ {-1, 0, 1} ⇒ x² = |x|), so reference implementations canonically use the L1 form for speed. They are not equal in general — if you ever supply a non-binary wafer (e.g. soft resist contours), the two diverge — but for the canonical contract they agree. The l2_error_pixels field name is preserved for cross-paper comparability; do not "fix" it to L1 without coordinating against the upstream tables.

Like :func:openlithohub.benchmark.metrics.epe.compute_wafer_epe, this metric requires the forward simulator in the loop. The :func:compute_epe mask-level metric scores 0 for an Identity model; compute_l2_error does not, because diffraction reshapes the printed contour even when the mask is unchanged.

L2ErrorResult

Bases: TypedDict

Per-sample L2 wafer-error summary.

Attributes:

Name Type Description
l2_error_pixels float

(wafer - target).abs().sum() in pixel units — the literal Neural-ILT contract value.

l2_error_nm2 float

Same quantity expressed as a physical area in nm² (l2_error_pixels * pixel_size_nm**2). Useful when comparing results computed at different pitches.

wafer_pixels int

Number of foreground pixels in the simulated wafer image. Reported alongside the error so a normalised ratio can be derived downstream without re-running the simulator.

target_pixels int

Foreground pixel count of the target layout.

Source code in src/openlithohub/benchmark/metrics/l2_error.py
class L2ErrorResult(TypedDict):
    """Per-sample L2 wafer-error summary.

    Attributes:
        l2_error_pixels: ``(wafer - target).abs().sum()`` in pixel units —
            the literal Neural-ILT contract value.
        l2_error_nm2: Same quantity expressed as a physical area in nm²
            (``l2_error_pixels * pixel_size_nm**2``). Useful when comparing
            results computed at different pitches.
        wafer_pixels: Number of foreground pixels in the simulated wafer
            image. Reported alongside the error so a normalised ratio can
            be derived downstream without re-running the simulator.
        target_pixels: Foreground pixel count of the target layout.
    """

    l2_error_pixels: float
    l2_error_nm2: float
    wafer_pixels: int
    target_pixels: int

compute_l2_error(predicted_mask, target, pixel_size_nm=1.0, simulator=None)

Compute L2 wafer error per the Neural-ILT eval contract.

Parameters:

Name Type Description Default
predicted_mask Tensor

Predicted mask (H, W), values in [0, 1].

required
target Tensor

Target layout (H, W), values in {0, 1}. Compared against the simulated wafer, not against the predicted mask.

required
pixel_size_nm float

Physical pixel size, used only to convert the pixel-unit error to an nm² area. Does not affect simulator sampling — pass a configured simulator for that.

1.0
simulator BaseSimulator | None

Forward simulator. Defaults to a fresh :class:HopkinsSimulator. Pass an explicit instance to keep dose / threshold / illumination consistent across a run.

None

Returns:

Type Description
L2ErrorResult

class:L2ErrorResult with the raw Neural-ILT scalar plus its

L2ErrorResult

nm² conversion and the supporting pixel counts.

Source code in src/openlithohub/benchmark/metrics/l2_error.py
def compute_l2_error(
    predicted_mask: torch.Tensor,
    target: torch.Tensor,
    pixel_size_nm: float = 1.0,
    simulator: BaseSimulator | None = None,
) -> L2ErrorResult:
    """Compute L2 wafer error per the Neural-ILT eval contract.

    Args:
        predicted_mask: Predicted mask (H, W), values in [0, 1].
        target: Target layout (H, W), values in {0, 1}. Compared against
            the *simulated wafer*, not against the predicted mask.
        pixel_size_nm: Physical pixel size, used only to convert the
            pixel-unit error to an nm² area. Does not affect simulator
            sampling — pass a configured ``simulator`` for that.
        simulator: Forward simulator. Defaults to a fresh
            :class:`HopkinsSimulator`. Pass an explicit instance to keep
            dose / threshold / illumination consistent across a run.

    Returns:
        :class:`L2ErrorResult` with the raw Neural-ILT scalar plus its
        nm² conversion and the supporting pixel counts.
    """
    if predicted_mask.shape != target.shape:
        raise ValueError(
            f"Shape mismatch: predicted {predicted_mask.shape} vs target {target.shape}"
        )

    if simulator is None:
        # Local import: simulators package builds SOCS kernels on init,
        # which we don't want to pay at metric module import time.
        from openlithohub.simulators.hopkins_sim import HopkinsSimulator

        simulator = HopkinsSimulator()

    sim_result = simulator.simulate(predicted_mask)
    if sim_result.resist is not None:
        wafer = sim_result.resist
    else:
        threshold = simulator.config.threshold * simulator.config.dose
        wafer = (sim_result.aerial >= threshold).to(sim_result.aerial.dtype)

    target_f = target.to(wafer.dtype)
    l2_pixels = float((wafer - target_f).abs().sum().item())

    return {
        "l2_error_pixels": l2_pixels,
        "l2_error_nm2": l2_pixels * pixel_size_nm * pixel_size_nm,
        "wafer_pixels": int(wafer.sum().item()),
        "target_pixels": int(target_f.sum().item()),
    }

openlithohub.benchmark.metrics.pvband

Process Variation Band (PV Band) computation.

Two forward-model paths are available:

  1. Default — fast Gaussian-PSF aerial-image approximation at four dose/focus corners. Cheap diagnostic that runs in inner loops and on every commit; this is what the baseline tables in baselines/results.md and the README report. The Gaussian model is calibrated so the absolute PV Band number tracks the SOCS result at the published Neural-ILT corners — both are stable signals of process-window robustness, but they are not interchangeable numerically.

  2. SOCS-faithful — simulator= keyword (added 2026-05-23). When a :class:BaseSimulator instance is passed, this metric drives the simulator at each (dose, defocus) corner via :meth:BaseSimulator.with_config, takes the binarised resist contour at each corner, and reports outer-vs-inner band thickness from the same kernels :func:compute_l2_error uses. This closes the "Gaussian PVB ≠ SOCS PVB" reproducibility footgun for paper authors comparing OPC numbers across implementations: when you need PVB derived from the same SOCS kernels as L2/EPE, pass the same configured simulator instance.

This path is opt-in to keep existing baseline numbers stable — passing simulator= will change the absolute number reported.

compute_pvband(mask, nominal_dose=1.0, dose_variation=0.05, defocus_range_nm=20.0, pixel_size_nm=1.0, simulator=None, resist_diffusion_nm=0.0, quencher=0.0)

Compute Process Variation Band width for a given mask.

PV Band measures the perpendicular distance between the resist contours at process window extremes.

With simulator=None (default) the cheap Gaussian-PSF approximation is used — see module docstring path (1).

With simulator=<BaseSimulator instance> the simulator is driven at four (dose × defocus) corners via with_config, and the band is computed from the same kernels — see module docstring path (2). The simulator's existing defocus_nm is used as the nominal centre; ±defocus_range_nm/2 is applied at the corners.

The factor of two converts "distance to the nearest contour" (half-width at the band's centerline) into the full perpendicular contour-to-contour distance that the literature publishes.

Source code in src/openlithohub/benchmark/metrics/pvband.py
def compute_pvband(
    mask: torch.Tensor,
    nominal_dose: float = 1.0,
    dose_variation: float = 0.05,
    defocus_range_nm: float = 20.0,
    pixel_size_nm: float = 1.0,
    simulator: BaseSimulator | None = None,
    resist_diffusion_nm: float = 0.0,
    quencher: float = 0.0,
) -> dict[str, float]:
    """Compute Process Variation Band width for a given mask.

    PV Band measures the perpendicular distance between the resist
    contours at process window extremes.

    With ``simulator=None`` (default) the cheap Gaussian-PSF approximation
    is used — see module docstring path (1).

    With ``simulator=<BaseSimulator instance>`` the simulator is driven
    at four ``(dose × defocus)`` corners via ``with_config``, and the
    band is computed from the same kernels — see module docstring path
    (2). The simulator's existing ``defocus_nm`` is used as the
    nominal centre; ``±defocus_range_nm/2`` is applied at the corners.

    The factor of two converts "distance to the nearest contour"
    (half-width at the band's centerline) into the full perpendicular
    contour-to-contour distance that the literature publishes.
    """
    m = ensure_2d(mask)
    binary = (m > 0.5).float()

    if simulator is None:
        outer_envelope, inner_envelope = _gaussian_pw_envelopes(
            binary,
            nominal_dose,
            dose_variation,
            defocus_range_nm,
            pixel_size_nm,
            resist_diffusion_nm=resist_diffusion_nm,
            quencher=quencher,
        )
    else:
        outer_envelope, inner_envelope = _simulator_pw_envelopes(
            binary,
            simulator,
            nominal_dose,
            dose_variation,
            defocus_range_nm,
            resist_diffusion_nm=resist_diffusion_nm,
            quencher=quencher,
        )

    band = (outer_envelope - inner_envelope).clamp(min=0.0)
    band_pixels = band.sum().item()
    if band_pixels < 1.0:
        return {"pvband_mean_nm": 0.0, "pvband_max_nm": 0.0}

    band_binary = (band > 0.5).float()
    dist_map = distance_transform(band_binary)

    band_mask = band_binary > 0.5
    if not band_mask.any():
        return {"pvband_mean_nm": 0.0, "pvband_max_nm": 0.0}

    distances = dist_map[band_mask] * pixel_size_nm
    pvband_mean = float(distances.mean().item()) * 2.0
    pvband_max = float(distances.max().item()) * 2.0
    return {"pvband_mean_nm": pvband_mean, "pvband_max_nm": pvband_max}

openlithohub.benchmark.metrics.shot_count

Shot count estimation for mask manufacturing cost.

estimate_shot_count(mask, writer_type='mbmw', min_shot_size_nm=5.0, pixel_size_nm=1.0)

Estimate the number of shots needed to write a mask.

Shot count is a direct proxy for mask writing time and manufacturing cost.

For multi-beam mask writers (MBMW), each foreground pixel corresponds to one beam exposure position. Shot count equals the number of foreground pixels scaled by the ratio of pixel area to beam grid area.

For variable shaped beam (VSB) writers, shots are rectangular exposures. The estimate uses the mask complexity (perimeter/area ratio) to approximate the number of rectangles needed.

Parameters:

Name Type Description Default
mask Tensor

Binary mask tensor (H, W).

required
writer_type str

'vsb' (variable shaped beam) or 'mbmw' (multi-beam).

'mbmw'
min_shot_size_nm float

Minimum addressable shot dimension.

5.0
pixel_size_nm float

Physical pixel size in nanometers.

1.0

Returns:

Type Description
dict[str, int | float]

Dictionary with 'shot_count' and 'estimated_write_time_s'.

Raises:

Type Description
ValueError

If writer_type is not 'mbmw' or 'vsb'.

Source code in src/openlithohub/benchmark/metrics/shot_count.py
def estimate_shot_count(
    mask: torch.Tensor,
    writer_type: str = "mbmw",
    min_shot_size_nm: float = 5.0,
    pixel_size_nm: float = 1.0,
) -> dict[str, int | float]:
    """Estimate the number of shots needed to write a mask.

    Shot count is a direct proxy for mask writing time and manufacturing cost.

    For multi-beam mask writers (MBMW), each foreground pixel corresponds to
    one beam exposure position. Shot count equals the number of foreground pixels
    scaled by the ratio of pixel area to beam grid area.

    For variable shaped beam (VSB) writers, shots are rectangular exposures.
    The estimate uses the mask complexity (perimeter/area ratio) to approximate
    the number of rectangles needed.

    Args:
        mask: Binary mask tensor (H, W).
        writer_type: 'vsb' (variable shaped beam) or 'mbmw' (multi-beam).
        min_shot_size_nm: Minimum addressable shot dimension.
        pixel_size_nm: Physical pixel size in nanometers.

    Returns:
        Dictionary with 'shot_count' and 'estimated_write_time_s'.

    Raises:
        ValueError: If writer_type is not 'mbmw' or 'vsb'.
    """
    if writer_type not in ("mbmw", "vsb"):
        raise ValueError(f"writer_type must be 'mbmw' or 'vsb', got '{writer_type}'")

    m = ensure_2d(mask)
    binary = (m > 0.5).float()

    foreground_pixels = int(binary.sum().item())

    if foreground_pixels == 0:
        return {"shot_count": 0, "estimated_write_time_s": 0.0}

    if writer_type == "mbmw":
        return _estimate_mbmw(binary, foreground_pixels, min_shot_size_nm, pixel_size_nm)
    return _estimate_vsb(binary, foreground_pixels, min_shot_size_nm, pixel_size_nm)

openlithohub.benchmark.metrics.stochastic

EUV stochastic robustness evaluation.

StochasticDefectRates dataclass

Per-class stochastic defect rates in failures per cm^2.

The four classes follow the imec EUV stochastic-defectivity convention (microbridge / broken line / missing contact / merged contact). Per-cm^2 rates are the industry reporting unit and let users compare against published defectivity floors regardless of mask tile size.

Source code in src/openlithohub/benchmark/metrics/stochastic.py
@dataclass(frozen=True)
class StochasticDefectRates:
    """Per-class stochastic defect rates in failures per cm^2.

    The four classes follow the imec EUV stochastic-defectivity convention
    (microbridge / broken line / missing contact / merged contact). Per-cm^2
    rates are the industry reporting unit and let users compare against
    published defectivity floors regardless of mask tile size.
    """

    microbridge_per_cm2: float
    broken_line_per_cm2: float
    missing_contact_per_cm2: float
    merged_contact_per_cm2: float
    total_per_cm2: float
    num_trials: int
    image_area_cm2: float

compute_stochastic_robustness(mask, num_trials=100, dose_photons_per_nm2=30.0, pixel_size_nm=1.0, seed=0, resist_threshold=THRESHOLD_ICCAD16, resist_diffusion_nm=0.0, quencher=0.0)

Evaluate mask robustness against EUV photon shot noise.

Simulates stochastic resist exposure via Poisson photon noise to quantify probability of micro-bridging and line breaks.

seed defaults to 0 so leaderboard runs are reproducible. Pass seed=None to draw from system entropy (intentional non-determinism, e.g. ensemble runs).

resist_threshold defaults to 0.225 to match the LithoBench/Yang2023 calibration the leaderboard L2/PVB metrics use; pass 0.5 for the legacy mid-grey cut. Issue #19: previously hard-coded to 0.5, which gave a different resist contour than the metrics it is reported alongside.

Source code in src/openlithohub/benchmark/metrics/stochastic.py
def compute_stochastic_robustness(
    mask: torch.Tensor,
    num_trials: int = 100,
    dose_photons_per_nm2: float = 30.0,
    pixel_size_nm: float = 1.0,
    seed: int | None = 0,
    resist_threshold: float = THRESHOLD_ICCAD16,
    resist_diffusion_nm: float = 0.0,
    quencher: float = 0.0,
) -> dict[str, float]:
    """Evaluate mask robustness against EUV photon shot noise.

    Simulates stochastic resist exposure via Poisson photon noise to quantify
    probability of micro-bridging and line breaks.

    ``seed`` defaults to ``0`` so leaderboard runs are reproducible. Pass
    ``seed=None`` to draw from system entropy (intentional non-determinism,
    e.g. ensemble runs).

    ``resist_threshold`` defaults to 0.225 to match the LithoBench/Yang2023
    calibration the leaderboard L2/PVB metrics use; pass 0.5 for the legacy
    mid-grey cut. Issue #19: previously hard-coded to 0.5, which gave a
    different resist contour than the metrics it is reported alongside.
    """
    state = _nominal_state(
        mask,
        dose_photons_per_nm2,
        pixel_size_nm,
        resist_threshold=resist_threshold,
        resist_diffusion_nm=resist_diffusion_nm,
        quencher=quencher,
    )
    resist_nominal = state.resist_nominal
    fg_labels = state.fg_labels

    nominal_fg_label_set: set[int] = {
        int(v) for v in torch.unique(fg_labels[fg_labels >= 0]).tolist()
    }

    generator = torch.Generator(device=mask.device)
    if seed is not None:
        generator.manual_seed(seed)

    bridge_count = 0
    break_count = 0
    edge_flip_values: list[float] = []

    nominal_edge_dist = distance_transform(resist_nominal)
    nominal_edges = (nominal_edge_dist > 0) & (nominal_edge_dist <= 1.5)

    for _ in range(num_trials):
        photons = torch.poisson(state.lambda_map, generator=generator)
        noisy_intensity = photons / max(state.dose_scale, 1e-12)
        noisy_resist = apply_resist_threshold(
            noisy_intensity,
            threshold=resist_threshold,
            resist_diffusion_nm=resist_diffusion_nm,
            pixel_size_nm=pixel_size_nm,
            quencher=quencher,
        )

        # Per-component matching: a trial may simultaneously merge some
        # nominal lines and break others. The previous net-component-count
        # heuristic made these events cancel; tracking them independently
        # lets each trial contribute to both bridge and break probability.
        noisy_fg_labels, _ = connected_components(noisy_resist, connectivity=8)

        bridge_in_trial = False
        if nominal_fg_label_set:
            unique_noisy = torch.unique(noisy_fg_labels[noisy_fg_labels >= 0]).tolist()
            for noisy_lbl in unique_noisy:
                noisy_component = noisy_fg_labels == int(noisy_lbl)
                overlapping_nominal = torch.unique(fg_labels[noisy_component])
                overlapping_nominal = overlapping_nominal[overlapping_nominal >= 0]
                if int(overlapping_nominal.numel()) >= 2:
                    bridge_in_trial = True
                    break

        break_in_trial = False
        for nominal_lbl in nominal_fg_label_set:
            component_mask = fg_labels == nominal_lbl
            sub = (noisy_resist > resist_threshold) & component_mask
            if not bool(sub.any()):
                continue
            _, n_pieces = connected_components(sub.float(), connectivity=8)
            if n_pieces >= 2:
                break_in_trial = True
                break

        if bridge_in_trial:
            bridge_count += 1
        if break_in_trial:
            break_count += 1

        # Fraction of nominal edge-band pixels whose binary state flipped under
        # photon noise. This is dimensionless (0–1), NOT a line-edge roughness
        # in nm — true LER would require sub-pixel chord-displacement along
        # each contour normal. Reported separately so users don't conflate it
        # with published EUV LER numbers (~1–5 nm).
        diff = (noisy_resist - resist_nominal).abs()
        if nominal_edges.any():
            edge_flip_values.append(diff[nominal_edges].mean().item())

    bridge_probability = bridge_count / max(num_trials, 1)
    break_probability = break_count / max(num_trials, 1)
    edge_flip_rate = (
        sum(edge_flip_values) / max(len(edge_flip_values), 1) if edge_flip_values else 0.0
    )
    robustness_score = max(0.0, 1.0 - (bridge_probability + break_probability) / 2.0)

    return {
        "bridge_probability": bridge_probability,
        "break_probability": break_probability,
        "edge_flip_rate": edge_flip_rate,
        "robustness_score": robustness_score,
    }

compute_stochastic_defect_classes(mask, num_trials=100, dose_photons_per_nm2=30.0, pixel_size_nm=1.0, seed=0, contact_aspect_max=1.5, contact_area_max=64, resist_threshold=THRESHOLD_ICCAD16, resist_diffusion_nm=0.0, quencher=0.0)

Per-class EUV stochastic defect rates in failures/cm^2.

Extends :func:compute_stochastic_robustness (which returns aggregate bridge/break probabilities) with the four imec-style defect classes reported by the EUV stochastic-defectivity literature: microbridges, broken lines, missing contacts, and merged contacts. Output is normalised to failures per cm^2 so results are comparable across different mask tile sizes.

Parameters:

Name Type Description Default
mask Tensor

Real-valued mask tensor (H, W) or 4D, values in [0, 1].

required
num_trials int

Number of Poisson trials. More trials → tighter rate estimates; 100 is a reasonable benchmarking default.

100
dose_photons_per_nm2 float

Exposure dose in photons / nm^2 at the wafer. Scales the Poisson rate map.

30.0
pixel_size_nm float

Mask pixel size in nm; used both for the Poisson rate scaling and for converting failure counts to per-cm^2.

1.0
seed int | None

Optional RNG seed.

0
contact_aspect_max float

Maximum bounding-box long/short ratio for a component to count as contact-like. Lines are everything else.

1.5
contact_area_max int

Maximum pixel area for a component to count as contact-like. Tune for the contact size on your process node.

64

Returns:

Type Description
StochasticDefectRates

StochasticDefectRates with per-class and total failure rates.

Source code in src/openlithohub/benchmark/metrics/stochastic.py
def compute_stochastic_defect_classes(
    mask: torch.Tensor,
    num_trials: int = 100,
    dose_photons_per_nm2: float = 30.0,
    pixel_size_nm: float = 1.0,
    seed: int | None = 0,
    contact_aspect_max: float = 1.5,
    contact_area_max: int = 64,
    resist_threshold: float = THRESHOLD_ICCAD16,
    resist_diffusion_nm: float = 0.0,
    quencher: float = 0.0,
) -> StochasticDefectRates:
    """Per-class EUV stochastic defect rates in failures/cm^2.

    Extends :func:`compute_stochastic_robustness` (which returns aggregate
    bridge/break probabilities) with the four imec-style defect classes
    reported by the EUV stochastic-defectivity literature: microbridges,
    broken lines, missing contacts, and merged contacts. Output is
    normalised to failures per cm^2 so results are comparable across
    different mask tile sizes.

    Args:
        mask: Real-valued mask tensor (H, W) or 4D, values in [0, 1].
        num_trials: Number of Poisson trials. More trials → tighter rate
            estimates; 100 is a reasonable benchmarking default.
        dose_photons_per_nm2: Exposure dose in photons / nm^2 at the wafer.
            Scales the Poisson rate map.
        pixel_size_nm: Mask pixel size in nm; used both for the Poisson
            rate scaling and for converting failure counts to per-cm^2.
        seed: Optional RNG seed.
        contact_aspect_max: Maximum bounding-box long/short ratio for a
            component to count as contact-like. Lines are everything else.
        contact_area_max: Maximum pixel area for a component to count as
            contact-like. Tune for the contact size on your process node.

    Returns:
        StochasticDefectRates with per-class and total failure rates.
    """
    state = _nominal_state(
        mask,
        dose_photons_per_nm2,
        pixel_size_nm,
        resist_threshold=resist_threshold,
        resist_diffusion_nm=resist_diffusion_nm,
        quencher=quencher,
    )
    resist_nominal = state.resist_nominal
    nominal_fg_labels = state.fg_labels
    nominal_bg_labels = state.bg_labels

    nominal_contacts, nominal_lines = _classify_components(
        nominal_fg_labels,
        num_components=int(torch.unique(nominal_fg_labels[nominal_fg_labels >= 0]).numel()),
        contact_aspect_max=contact_aspect_max,
        contact_area_max=contact_area_max,
    )
    bg_hole_labels = torch.unique(nominal_bg_labels[nominal_bg_labels >= 0]).tolist()
    h, w = resist_nominal.shape
    nominal_bg_holes: set[int] = set()
    for lbl in bg_hole_labels:
        ys, xs = torch.where(nominal_bg_labels == lbl)
        if ys.numel() == 0:
            continue
        touches_border = bool(
            (ys == 0).any() or (ys == h - 1).any() or (xs == 0).any() or (xs == w - 1).any()
        )
        if touches_border:
            continue
        nominal_bg_holes.add(int(lbl))

    generator = torch.Generator(device=mask.device)
    if seed is not None:
        generator.manual_seed(seed)

    microbridges = 0
    broken_lines = 0
    missing_contacts = 0
    merged_contacts = 0

    for _ in range(num_trials):
        photons = torch.poisson(state.lambda_map, generator=generator)
        noisy_intensity = photons / max(state.dose_scale, 1e-12)
        noisy_resist = apply_resist_threshold(
            noisy_intensity,
            threshold=resist_threshold,
            resist_diffusion_nm=resist_diffusion_nm,
            pixel_size_nm=pixel_size_nm,
            quencher=quencher,
        )

        mb, bl, mc, mr = _trial_defect_classes(
            nominal_resist=resist_nominal,
            noisy_resist=noisy_resist,
            nominal_fg_labels=nominal_fg_labels,
            nominal_bg_labels=nominal_bg_labels,
            nominal_contacts=nominal_contacts,
            nominal_lines=nominal_lines,
            nominal_bg_holes=nominal_bg_holes,
            resist_threshold=resist_threshold,
        )
        microbridges += mb
        broken_lines += bl
        missing_contacts += mc
        merged_contacts += mr

    image_area_nm2 = float(h * w * state.pixel_area_nm2)
    image_area_cm2 = image_area_nm2 * 1e-14
    norm_per_cm2 = 1.0 / max(num_trials * image_area_cm2, 1e-30)

    rates = StochasticDefectRates(
        microbridge_per_cm2=microbridges * norm_per_cm2,
        broken_line_per_cm2=broken_lines * norm_per_cm2,
        missing_contact_per_cm2=missing_contacts * norm_per_cm2,
        merged_contact_per_cm2=merged_contacts * norm_per_cm2,
        total_per_cm2=(microbridges + broken_lines + missing_contacts + merged_contacts)
        * norm_per_cm2,
        num_trials=num_trials,
        image_area_cm2=image_area_cm2,
    )
    return rates

openlithohub.benchmark.metrics.monte_carlo

Monte Carlo stochastic-failure evaluation against a simulator backend.

Complements :func:compute_stochastic_robustness (which uses a fast Gaussian-PSF model and Poisson photon noise) by letting callers run the same Monte Carlo loop against any :class:openlithohub.simulators.BaseSimulator — including the bundled Hopkins/SOCS model or, with the appropriate adapter, a commercial simulator.

This is the "give me a stochastic-failure number against my preferred forward model" entry point that the v0.1 roadmap calls for.

MonteCarloFailureResult dataclass

Result of a Monte Carlo stochastic-failure run.

Source code in src/openlithohub/benchmark/metrics/monte_carlo.py
@dataclass
class MonteCarloFailureResult:
    """Result of a Monte Carlo stochastic-failure run."""

    bridge_probability: float
    break_probability: float
    failure_probability: float
    num_trials: int

    def _repr_html_(self) -> str:
        from openlithohub.jupyter._html import kv_table, panel, pass_fail_badge

        passed = self.failure_probability < 0.01
        rows = [
            ("Failure probability", f"{self.failure_probability:.4%}"),
            ("Bridge probability", f"{self.bridge_probability:.4%}"),
            ("Break probability", f"{self.break_probability:.4%}"),
            ("Trials", str(self.num_trials)),
        ]
        return panel(
            title="Monte Carlo failure",
            header_html=pass_fail_badge(passed),
            body_html=kv_table(rows),
        )

monte_carlo_failure_probability(mask, simulator, num_trials=50, dose_jitter_sigma=0.02, threshold_jitter_sigma=0.01, seed=0, perturb=None)

Estimate stochastic-failure probability against a simulator backend.

Runs num_trials independent simulations with small per-trial perturbations to dose and resist threshold (and, if provided, a user-supplied perturb operator on the mask itself). Counts how often the resulting resist contour acquires extra connected components ("breaks") or merges existing ones ("bridges") relative to the nominal run.

A trial that simultaneously bridges one component pair and breaks a different component is counted as a failure on both axes — the earlier net component count heuristic would have masked the pair as a no-op (issue #55).

Dose jitter is applied as a post-hoc aerial scaling rather than via config.dose: the bundled HopkinsSimulator's threshold scales with dose (threshold = cfg.threshold * cfg.dose), so pushing jitter into cfg.dose cancels at the threshold and the perturbation becomes a no-op (issue #54, downstream of #52).

Parameters:

Name Type Description Default
mask Tensor

(H, W) real-valued mask in [0, 1].

required
simulator BaseSimulator

Simulator backend. Must produce a resist field; backends that don't will be wrapped to threshold the aerial.

required
num_trials int

Number of perturbed simulations.

50
dose_jitter_sigma float

Std-dev of multiplicative dose jitter.

0.02
threshold_jitter_sigma float

Std-dev of additive resist-threshold jitter.

0.01
seed int | None

PRNG seed; defaults to 0 so leaderboard runs are reproducible. Pass None for a fresh entropy-seeded generator.

0
perturb Callable[[Tensor, Generator], Tensor] | None

Optional (mask, generator) -> mask callable for domain-specific perturbations (e.g. mask-write spot jitter).

None

Returns:

Type Description
MonteCarloFailureResult

class:MonteCarloFailureResult.

Source code in src/openlithohub/benchmark/metrics/monte_carlo.py
def monte_carlo_failure_probability(
    mask: torch.Tensor,
    simulator: BaseSimulator,
    num_trials: int = 50,
    dose_jitter_sigma: float = 0.02,
    threshold_jitter_sigma: float = 0.01,
    seed: int | None = 0,
    perturb: Callable[[torch.Tensor, torch.Generator], torch.Tensor] | None = None,
) -> MonteCarloFailureResult:
    """Estimate stochastic-failure probability against a simulator backend.

    Runs ``num_trials`` independent simulations with small per-trial
    perturbations to dose and resist threshold (and, if provided, a
    user-supplied ``perturb`` operator on the mask itself). Counts how
    often the resulting resist contour acquires extra connected
    components ("breaks") or merges existing ones ("bridges") relative
    to the nominal run.

    A trial that simultaneously bridges one component pair *and* breaks
    a different component is counted as a failure on both axes — the
    earlier ``net component count`` heuristic would have masked the
    pair as a no-op (issue #55).

    Dose jitter is applied as a post-hoc aerial scaling rather than via
    ``config.dose``: the bundled HopkinsSimulator's threshold scales
    with dose (``threshold = cfg.threshold * cfg.dose``), so pushing
    jitter into ``cfg.dose`` cancels at the threshold and the perturbation
    becomes a no-op (issue #54, downstream of #52).

    Args:
        mask: ``(H, W)`` real-valued mask in ``[0, 1]``.
        simulator: Simulator backend. Must produce a ``resist`` field;
            backends that don't will be wrapped to threshold the aerial.
        num_trials: Number of perturbed simulations.
        dose_jitter_sigma: Std-dev of multiplicative dose jitter.
        threshold_jitter_sigma: Std-dev of additive resist-threshold
            jitter.
        seed: PRNG seed; defaults to ``0`` so leaderboard runs are
            reproducible. Pass ``None`` for a fresh entropy-seeded generator.
        perturb: Optional ``(mask, generator) -> mask`` callable for
            domain-specific perturbations (e.g. mask-write spot jitter).

    Returns:
        :class:`MonteCarloFailureResult`.
    """

    m = ensure_2d(mask).detach()
    nominal = simulator.simulate(m)
    nominal_threshold = simulator.config.threshold
    nominal_resist = (
        nominal.resist
        if nominal.resist is not None
        else (nominal.aerial >= nominal_threshold).to(nominal.aerial.dtype)
    )

    generator = torch.Generator(device=m.device)
    if seed is not None:
        generator.manual_seed(seed)

    bridge_count = 0
    break_count = 0

    for _trial in range(num_trials):
        # Per-trial multiplicative dose jitter and additive threshold
        # jitter. We apply both *outside* the simulator and scale the
        # aerial intensity directly. After issue #52 was fixed (the
        # threshold no longer scales with dose), passing dose through
        # `config.dose` would also work, but doing it here keeps the
        # MC path independent of any future simulator-side dose
        # convention drift and lets us apply both jitters in one
        # pass without rebuilding the simulator config.
        dose_factor = 1.0 + dose_jitter_sigma * torch.randn(1, generator=generator).item()
        dose_factor = max(dose_factor, 1e-6)
        threshold_offset = threshold_jitter_sigma * torch.randn(1, generator=generator).item()
        trial_threshold = max(nominal_threshold + threshold_offset, 1e-6)

        trial_mask = perturb(m, generator) if perturb is not None else m
        result = simulator.simulate(trial_mask)
        # Apply jitter to the aerial intensity directly: a higher dose
        # multiplies aerial photons, a lower threshold lowers the resist
        # cutoff. Both feed into the same binarisation.
        scaled_aerial = result.aerial * dose_factor
        resist = (scaled_aerial >= trial_threshold).to(result.aerial.dtype)

        has_bridge, has_break = _bridge_and_break_versus(nominal_resist, resist)
        if has_bridge:
            bridge_count += 1
        if has_break:
            break_count += 1

    bridge_p = bridge_count / max(num_trials, 1)
    break_p = break_count / max(num_trials, 1)
    # A single trial can be both a bridge and a break, so failure_p is
    # *not* the sum (which would over-count those trials). Use the
    # union: failure = trials with bridge OR break = bridge + break -
    # both. We don't track ``both`` explicitly, so use the inclusion-
    # exclusion upper bound min(bridge + break, 1.0) as a conservative
    # estimate that never exceeds 1.
    failure_p = min(bridge_p + break_p, 1.0)
    return MonteCarloFailureResult(
        bridge_probability=bridge_p,
        break_probability=break_p,
        failure_probability=failure_p,
        num_trials=num_trials,
    )

openlithohub.benchmark.metrics.euv_3d

EUV 3D-mask shadow-effect proxy metric.

Real EUV mask 3D simulation (rigorous Maxwell) is expensive and lives in commercial tools like HyperLith / EM-Suite. This module ships a cheap proxy that captures the dominant first-order effect: shadowing-induced bias that depends on feature orientation relative to the chief-ray direction.

What we model

For an EUV reflective mask at a non-zero chief-ray angle of incidence (typically 6° in NXE:3400-class scanners), the absorber casts a geometric shadow whose magnitude depends on:

  • absorber thickness (≈70 nm Ta-based, ≈30 nm low-n attenuated PSM);
  • angle of incidence;
  • feature orientation (horizontal vs vertical lines respond differently — the well-known H–V CD bias).

We compute a per-pixel shadow displacement field and convolve the binary mask with an anisotropic shadow kernel, then compare the resulting "3D-corrected" aerial against a thin-mask aerial. The L2 residual between the two is a reasonable proxy for "how much rigorous 3D simulation would disagree with the Hopkins thin-mask result on this layout".

This is a proxy, not a substitute, for rigorous 3D-mask EMF simulation. Its purpose is to flag layouts that are at risk of large 3D errors at evaluation time without paying the cost of a Maxwell solver. For papers that require ground-truth 3D, hook a real simulator via :class:openlithohub.simulators.BaseSimulator.

Mask3DParams dataclass

Parameters for the EUV 3D-mask shadow proxy.

Attributes:

Name Type Description
absorber_thickness_nm float

Absorber stack height. 70 nm = Ta-based, 30 nm = low-n attenuated PSM.

chief_ray_angle_deg float

Chief-ray angle of incidence at the mask. 6° for NXE:3400-class scanners.

chief_ray_azimuth_deg float

Azimuth of the chief ray (0° = +x). Sets the shadow direction.

pixel_size_nm float

Mask-side pixel pitch.

Source code in src/openlithohub/benchmark/metrics/euv_3d.py
@dataclass(frozen=True)
class Mask3DParams:
    """Parameters for the EUV 3D-mask shadow proxy.

    Attributes:
        absorber_thickness_nm: Absorber stack height. 70 nm = Ta-based,
            30 nm = low-n attenuated PSM.
        chief_ray_angle_deg: Chief-ray angle of incidence at the mask.
            6° for NXE:3400-class scanners.
        chief_ray_azimuth_deg: Azimuth of the chief ray (0° = +x). Sets
            the shadow direction.
        pixel_size_nm: Mask-side pixel pitch.
    """

    absorber_thickness_nm: float = ABSORBER_THICKNESS_NM_DEFAULT
    chief_ray_angle_deg: float = CHIEF_RAY_ANGLE_DEG_DEFAULT
    chief_ray_azimuth_deg: float = CHIEF_RAY_AZIMUTH_DEG_DEFAULT
    pixel_size_nm: float = PIXEL_SIZE_NM_DEFAULT

apply_3d_shadow(mask, params=None)

Apply the 3D-shadow proxy operator to a binary mask.

Parameters:

Name Type Description Default
mask Tensor

(H, W) real-valued mask in [0, 1].

required
params Mask3DParams | None

Shadow parameters; defaults to NXE:3400-like.

None

Returns:

Type Description
Tensor

Same-shape mask with the shadow operator applied. The result is

Tensor

no longer strictly binary — it represents the effective

Tensor

attenuation seen by the optical model.

Source code in src/openlithohub/benchmark/metrics/euv_3d.py
def apply_3d_shadow(
    mask: torch.Tensor,
    params: Mask3DParams | None = None,
) -> torch.Tensor:
    """Apply the 3D-shadow proxy operator to a binary mask.

    Args:
        mask: ``(H, W)`` real-valued mask in ``[0, 1]``.
        params: Shadow parameters; defaults to NXE:3400-like.

    Returns:
        Same-shape mask with the shadow operator applied. The result is
        no longer strictly binary — it represents the effective
        attenuation seen by the optical model.
    """

    p = params or Mask3DParams()
    m = ensure_2d(mask)
    kernel = _shadow_kernel(p, m.device)
    inp = m.unsqueeze(0).unsqueeze(0)
    radius = kernel.shape[-1] // 2
    padded = functional.pad(inp, [radius] * 4, mode="circular")
    shadowed = functional.conv2d(padded, kernel)
    return shadowed.squeeze(0).squeeze(0).clamp(0.0, 1.0)

compute_3d_mask_residual(mask, params=None, sim_config=None)

Quantify expected disagreement between thin-mask and 3D-mask aerials.

Runs the bundled Hopkins/SOCS simulator twice — once on the input mask (thin-mask assumption) and once on the shadow-corrected mask — and reports the L2 and L_inf residuals plus the H–V CD-bias proxy.

Parameters:

Name Type Description Default
mask Tensor

(H, W) real mask.

required
params Mask3DParams | None

Shadow parameters.

None
sim_config SimulatorConfig | None

Optional simulator config; defaults to EUV-ish (13.5 nm, NA 0.33).

None

Returns:

Type Description
dict[str, float]

Dict with keys residual_l2, residual_linf, and

dict[str, float]

hv_bias_nm (positive when horizontal lines print wider than

dict[str, float]

vertical lines after 3D-shadow correction). The H–V bias is

dict[str, float]

derived from thresholded shadowed-mask area, normalised by the

dict[str, float]

original mask's contour length, so it carries genuine units of

dict[str, float]

length — see :func:_hv_cd_bias_nm.

Source code in src/openlithohub/benchmark/metrics/euv_3d.py
def compute_3d_mask_residual(
    mask: torch.Tensor,
    params: Mask3DParams | None = None,
    sim_config: SimulatorConfig | None = None,
) -> dict[str, float]:
    """Quantify expected disagreement between thin-mask and 3D-mask aerials.

    Runs the bundled Hopkins/SOCS simulator twice — once on the input
    mask (thin-mask assumption) and once on the shadow-corrected mask —
    and reports the L2 and L_inf residuals plus the H–V CD-bias proxy.

    Args:
        mask: ``(H, W)`` real mask.
        params: Shadow parameters.
        sim_config: Optional simulator config; defaults to EUV-ish
            (13.5 nm, NA 0.33).

    Returns:
        Dict with keys ``residual_l2``, ``residual_linf``, and
        ``hv_bias_nm`` (positive when horizontal lines print wider than
        vertical lines after 3D-shadow correction). The H–V bias is
        derived from thresholded shadowed-mask area, normalised by the
        original mask's contour length, so it carries genuine units of
        length — see :func:`_hv_cd_bias_nm`.
    """

    p = params or Mask3DParams()
    cfg = sim_config or SimulatorConfig(
        wavelength_nm=WAVELENGTH_EUV_NM,
        na=NA_EUV_STANDARD,
        sigma=SIGMA_OUTER_DEFAULT,
        pixel_size_nm=p.pixel_size_nm,
    )
    sim = HopkinsSimulator(cfg)

    m = ensure_2d(mask).to(torch.float32)
    aerial_thin = sim.simulate(m).aerial
    aerial_3d = sim.simulate(apply_3d_shadow(m, p)).aerial

    diff = (aerial_3d - aerial_thin).detach()
    residual_l2 = float(diff.pow(2).mean().sqrt().item())
    residual_linf = float(diff.abs().max().item())

    horizontal_p = Mask3DParams(
        absorber_thickness_nm=p.absorber_thickness_nm,
        chief_ray_angle_deg=p.chief_ray_angle_deg,
        chief_ray_azimuth_deg=0.0,
        pixel_size_nm=p.pixel_size_nm,
    )
    vertical_p = Mask3DParams(
        absorber_thickness_nm=p.absorber_thickness_nm,
        chief_ray_angle_deg=p.chief_ray_angle_deg,
        chief_ray_azimuth_deg=90.0,
        pixel_size_nm=p.pixel_size_nm,
    )
    shadowed_h = apply_3d_shadow(m, horizontal_p)
    shadowed_v = apply_3d_shadow(m, vertical_p)
    hv_bias_nm = _hv_cd_bias_nm(shadowed_h, shadowed_v, m, cfg.pixel_size_nm)

    return {
        "residual_l2": residual_l2,
        "residual_linf": residual_linf,
        "hv_bias_nm": hv_bias_nm,
    }

openlithohub.benchmark.metrics.hotspot

Hotspot detection metric — recall / precision / F1 with distance-tolerant matching against a ground-truth point list.

This is the canonical evaluation used by ICCAD'16 Problem C and the hotspot-detection literature (e.g. Yang et al., TCAD 2020): a predicted point counts as a true positive if any ground-truth point lies within a configurable radius (match_radius_nm). Each GT point may be matched at most once — duplicate predictions inside the same tolerance disk become false positives. GT points with no predictor inside the disk are false negatives.

The matching is point-based, not pixel-based. If your predictor outputs a binary heatmap, run connected-components and feed the centroids (in nm) as predicted_points. openlithohub._utils.morphology has the primitives — there is no need to reinvent them.

Coordinates are in nanometers throughout to match the rest of the benchmark stack (LithoSample.metadata exposes nm units consistently).

compute_hotspot_detection(predicted_points, ground_truth_points, match_radius_nm=1.0)

Score a hotspot predictor against a ground-truth point list.

A predicted point is a true positive iff it can be paired with a GT point within match_radius_nm, under a maximum-cardinality minimum-cost assignment (Hungarian algorithm) — independent of the order predicted_points arrives in.

Parameters:

Name Type Description Default
predicted_points Tensor

(N, 2) tensor of predicted hotspot centers in nm. Pass an empty (0, 2) tensor for an empty prediction.

required
ground_truth_points Tensor

(M, 2) tensor of ground-truth hotspot centers in nm. Pass an empty (0, 2) tensor when no hotspots exist for the case.

required
match_radius_nm float

Maximum nm distance at which a predicted point is considered to have located a GT hotspot. ICCAD'16 literature commonly uses 1 nm (exact-pixel match) or a few nm to allow for centroid jitter.

1.0

Returns:

Type Description
dict[str, float]

Dict with num_tp, num_fp, num_fn, recall,

dict[str, float]

precision, f1. Counts are returned as floats so the

dict[str, float]

result merges cleanly with other dict[str, float] metrics.

dict[str, float]

Edge cases:

dict[str, float]
  • No GT and no predictions → recall/precision/F1 = 1.0 (vacuous perfect score). This convention matches sklearn's behavior when y_true and y_pred are both empty.
dict[str, float]
  • GT present but no predictions → recall=0, precision=1.0 (vacuously: nothing predicted, so nothing is wrong), F1=0.
dict[str, float]
  • Predictions present but no GT → recall=1.0, precision=0, F1=0.
Source code in src/openlithohub/benchmark/metrics/hotspot.py
def compute_hotspot_detection(
    predicted_points: torch.Tensor,
    ground_truth_points: torch.Tensor,
    match_radius_nm: float = 1.0,
) -> dict[str, float]:
    """Score a hotspot predictor against a ground-truth point list.

    A predicted point is a true positive iff it can be paired with a GT
    point within ``match_radius_nm``, under a *maximum-cardinality
    minimum-cost* assignment (Hungarian algorithm) — independent of the
    order ``predicted_points`` arrives in.

    Args:
        predicted_points: ``(N, 2)`` tensor of predicted hotspot
            centers in nm. Pass an empty ``(0, 2)`` tensor for an empty
            prediction.
        ground_truth_points: ``(M, 2)`` tensor of ground-truth hotspot
            centers in nm. Pass an empty ``(0, 2)`` tensor when no
            hotspots exist for the case.
        match_radius_nm: Maximum nm distance at which a predicted point
            is considered to have located a GT hotspot. ICCAD'16
            literature commonly uses 1 nm (exact-pixel match) or a few
            nm to allow for centroid jitter.

    Returns:
        Dict with ``num_tp``, ``num_fp``, ``num_fn``, ``recall``,
        ``precision``, ``f1``. Counts are returned as floats so the
        result merges cleanly with other ``dict[str, float]`` metrics.

        Edge cases:

        - No GT and no predictions → recall/precision/F1 = 1.0 (vacuous
            perfect score). This convention matches sklearn's behavior
            when ``y_true`` and ``y_pred`` are both empty.
        - GT present but no predictions → recall=0, precision=1.0
            (vacuously: nothing predicted, so nothing is wrong), F1=0.
        - Predictions present but no GT → recall=1.0, precision=0, F1=0.
    """
    if predicted_points.ndim != 2 or predicted_points.shape[-1] != 2:
        raise ValueError(
            f"predicted_points must have shape (N, 2), got {tuple(predicted_points.shape)}"
        )
    if ground_truth_points.ndim != 2 or ground_truth_points.shape[-1] != 2:
        raise ValueError(
            f"ground_truth_points must have shape (M, 2), got {tuple(ground_truth_points.shape)}"
        )
    if match_radius_nm < 0:
        raise ValueError(f"match_radius_nm must be >= 0, got {match_radius_nm}")

    n_pred = predicted_points.shape[0]
    n_gt = ground_truth_points.shape[0]

    if n_pred == 0 and n_gt == 0:
        return {
            "num_tp": 0.0,
            "num_fp": 0.0,
            "num_fn": 0.0,
            "recall": 1.0,
            "precision": 1.0,
            "f1": 1.0,
        }
    if n_pred == 0:
        return {
            "num_tp": 0.0,
            "num_fp": 0.0,
            "num_fn": float(n_gt),
            "recall": 0.0,
            "precision": 1.0,
            "f1": 0.0,
        }
    if n_gt == 0:
        return {
            "num_tp": 0.0,
            "num_fp": float(n_pred),
            "num_fn": 0.0,
            "recall": 1.0,
            "precision": 0.0,
            "f1": 0.0,
        }

    pred = predicted_points.float()
    gt = ground_truth_points.float()
    dists = torch.cdist(pred, gt)  # (N, M)

    radius = float(match_radius_nm)
    # Mark out-of-disk pairs with a large finite cost so the Hungarian
    # solver still runs on rectangular matrices (it requires a square cost
    # internally, but linear_sum_assignment handles non-square fine; only
    # finite costs matter). We post-filter those infeasible pairings.
    # Lazy scipy import: scipy lives in the [workflow] extra, but
    # importing this module unconditionally pulls scipy into every test
    # shard via benchmark/metrics/__init__.py. Defer to call time so
    # only callers that actually score hotspots pay the dependency.
    from scipy.optimize import linear_sum_assignment

    cost = dists.detach().cpu().numpy().astype(np.float64)
    big = radius + 1.0  # any value strictly greater than radius
    cost_for_solver = np.where(cost <= radius, cost, big)
    pred_idx, gt_idx = linear_sum_assignment(cost_for_solver)
    # A pair counts only if the *original* distance was inside the disk.
    valid = cost[pred_idx, gt_idx] <= radius
    num_tp = int(valid.sum())

    num_fp = n_pred - num_tp
    num_fn = n_gt - num_tp
    recall = num_tp / n_gt
    precision = num_tp / n_pred
    f1 = 2 * recall * precision / (recall + precision) if (recall + precision) > 0 else 0.0

    return {
        "num_tp": float(num_tp),
        "num_fp": float(num_fp),
        "num_fn": float(num_fn),
        "recall": float(recall),
        "precision": float(precision),
        "f1": float(f1),
    }

openlithohub.benchmark.metrics.sraf

SRAF non-printing penalty.

Sub-Resolution Assist Features should bias the diffraction pattern around main features without ever clearing the resist threshold themselves. A printed SRAF shows up on the wafer as a stray defect, which is a yield killer.

This module provides a differentiable penalty that callers add to their ILT or OPC training loss. It is complementary to (not a substitute for) the curvilinear MRC loss requested in issue #8 — that one polices mask geometry, this one polices the aerial-image response inside SRAF regions.

sraf_print_penalty(aerial_image, sraf_mask, *, print_threshold=0.3, margin=0.05)

Differentiable penalty for SRAFs whose aerial intensity risks printing.

For every pixel inside sraf_mask, penalise the amount by which the aerial intensity exceeds print_threshold - margin. Squared-ReLU keeps the gradient growing as the violation deepens, which empirically converges faster than plain L1 inside ILT inner loops.

Parameters:

Name Type Description Default
aerial_image Tensor

Simulated aerial image. Either (H, W) or (B, 1, H, W) — the rank is mirrored from simulate_aerial_image's contract.

required
sraf_mask Tensor

Binary tensor of the same shape as aerial_image. 1 indicates a pixel that belongs to an SRAF region (caller-provided — SRAF placement / detection is handled upstream, see issue #6).

required
print_threshold float

Resist-clearing threshold. Defaults to 0.30, comfortably below the nominal 0.50 so SRAFs are punished before they become bright enough to actually develop.

0.3
margin float

Safety headroom subtracted from print_threshold before the comparison — encourages the optimiser to hold SRAFs below the danger zone with a buffer.

0.05

Returns:

Type Description
Tensor

Scalar torch.Tensor (autograd-connected). 0 when no SRAF pixel

Tensor

exceeds the budget; positive otherwise.

Source code in src/openlithohub/benchmark/metrics/sraf.py
def sraf_print_penalty(
    aerial_image: torch.Tensor,
    sraf_mask: torch.Tensor,
    *,
    print_threshold: float = 0.30,
    margin: float = 0.05,
) -> torch.Tensor:
    """Differentiable penalty for SRAFs whose aerial intensity risks printing.

    For every pixel inside ``sraf_mask``, penalise the amount by which the
    aerial intensity exceeds ``print_threshold - margin``. Squared-ReLU keeps
    the gradient growing as the violation deepens, which empirically converges
    faster than plain L1 inside ILT inner loops.

    Args:
        aerial_image: Simulated aerial image. Either ``(H, W)`` or
            ``(B, 1, H, W)`` — the rank is mirrored from
            ``simulate_aerial_image``'s contract.
        sraf_mask: Binary tensor of the same shape as ``aerial_image``. ``1``
            indicates a pixel that belongs to an SRAF region (caller-provided —
            SRAF placement / detection is handled upstream, see issue #6).
        print_threshold: Resist-clearing threshold. Defaults to ``0.30``,
            comfortably below the nominal ``0.50`` so SRAFs are punished
            *before* they become bright enough to actually develop.
        margin: Safety headroom subtracted from ``print_threshold`` before the
            comparison — encourages the optimiser to hold SRAFs below the
            danger zone with a buffer.

    Returns:
        Scalar ``torch.Tensor`` (autograd-connected). ``0`` when no SRAF pixel
        exceeds the budget; positive otherwise.
    """
    if aerial_image.shape != sraf_mask.shape:
        raise ValueError(
            f"aerial_image and sraf_mask must share shape; got "
            f"{tuple(aerial_image.shape)} vs {tuple(sraf_mask.shape)}"
        )

    sraf_float = sraf_mask.to(aerial_image.dtype)
    sraf_pixel_count = sraf_float.sum()
    if float(sraf_pixel_count.detach()) < 1.0:
        return aerial_image.new_zeros(())

    budget = print_threshold - margin
    excess = functional.relu(aerial_image - budget)
    weighted = (excess.pow(2) * sraf_float).sum()
    return weighted / sraf_pixel_count.clamp(min=1.0)

openlithohub.benchmark.metrics.mrc_loss

Differentiable Mask Rule Check (MRC) loss for curvilinear masks.

Companion to benchmark.compliance.mrc.check_curvilinear_mrc (post-hoc binary verdict) and sraf.sraf_print_penalty (aerial-image-side penalty). This module gives optimisers a smooth, differentiable signal so curvilinear ILT / level-set / Neural-ILT models can learn to respect MRC during training instead of being scored on it afterwards.

Drop into a training loop:

loss = epe_loss + alpha * curvilinear_mrc_loss(mask, pdk="asap7")

See issue #8 for motivation.

curvilinear_mrc_loss(mask, pdk=None, *, min_width_nm=None, min_spacing_nm=None, min_curvature_radius_nm=20.0, pixel_size_nm=None, weight_min_cd=1.0, weight_min_spacing=1.0, weight_min_curvature=1.0)

Differentiable MRC penalty for curvilinear masks.

Three additive terms, each non-negative and zero on a clean mask:

  • Min-CD — soft morphological opening with structuring radius r = floor(min_width_nm / (2 * pixel_size_nm)). Pixels the mask claims that the opening drops contribute relu(mask - opening), summed and normalised by area. This mirrors the binary check in compliance.mrc.check_mrc so the loss and the verdict agree on what a violation is.
  • Min-spacing — same opening applied to 1 - mask; gaps too narrow to host the structuring element get penalised.
  • Min-curvature — boundary-band integral of the squared image gradient. ‖∇mask‖² peaks at sharp transitions, so any region where the local gradient magnitude exceeds the curvature budget 1 / min_curvature_radius_nm (in per-nm units) is squared-ReLU penalised. The "boundary band" is the symmetric difference dilation(mask, 1) - erosion(mask, 1), restricting the cost to pixels actually on a contour and keeping the loss well-defined for large flat interior regions.

Parameters:

Name Type Description Default
mask Tensor

Continuous mask in [0, 1]. Either (H, W) or (B, 1, H, W).

required
pdk PdkRules | str | None

PDK rules to source defaults from. May be a PdkRules instance or a preset name (e.g. "asap7", "freepdk45"). If None, min_width_nm, min_spacing_nm, and pixel_size_nm must all be supplied explicitly.

None
min_width_nm float | None

Override for pdk.min_width_nm.

None
min_spacing_nm float | None

Override for pdk.min_spacing_nm.

None
min_curvature_radius_nm float

Minimum allowed local radius of curvature. Defaults to 20 nm — looser than typical e-beam writer specs so the term doesn't dominate early in training.

20.0
pixel_size_nm float | None

Override for pdk.pixel_size_nm.

None
weight_min_cd float

Weight for the min-CD term.

1.0
weight_min_spacing float

Weight for the min-spacing term.

1.0
weight_min_curvature float

Weight for the min-curvature term.

1.0

Returns:

Type Description
Tensor

Scalar torch.Tensor (autograd-connected). Zero on a fully

Tensor

rule-respecting mask, positive otherwise.

Source code in src/openlithohub/benchmark/metrics/mrc_loss.py
def curvilinear_mrc_loss(
    mask: torch.Tensor,
    pdk: PdkRules | str | None = None,
    *,
    min_width_nm: float | None = None,
    min_spacing_nm: float | None = None,
    min_curvature_radius_nm: float = 20.0,
    pixel_size_nm: float | None = None,
    weight_min_cd: float = 1.0,
    weight_min_spacing: float = 1.0,
    weight_min_curvature: float = 1.0,
) -> torch.Tensor:
    """Differentiable MRC penalty for curvilinear masks.

    Three additive terms, each non-negative and zero on a clean mask:

    * **Min-CD** — soft morphological opening with structuring radius
      ``r = floor(min_width_nm / (2 * pixel_size_nm))``. Pixels the mask
      claims that the opening drops contribute ``relu(mask - opening)``,
      summed and normalised by area. This mirrors the binary check in
      ``compliance.mrc.check_mrc`` so the loss and the verdict agree on
      what a violation is.
    * **Min-spacing** — same opening applied to ``1 - mask``; gaps too
      narrow to host the structuring element get penalised.
    * **Min-curvature** — boundary-band integral of the squared image
      gradient. ``‖∇mask‖²`` peaks at sharp transitions, so any region
      where the local gradient magnitude exceeds the curvature budget
      ``1 / min_curvature_radius_nm`` (in per-nm units) is squared-ReLU
      penalised. The "boundary band" is the symmetric difference
      ``dilation(mask, 1) - erosion(mask, 1)``, restricting the cost to
      pixels actually on a contour and keeping the loss well-defined for
      large flat interior regions.

    Args:
        mask: Continuous mask in ``[0, 1]``. Either ``(H, W)`` or
            ``(B, 1, H, W)``.
        pdk: PDK rules to source defaults from. May be a ``PdkRules`` instance
            or a preset name (e.g. ``"asap7"``, ``"freepdk45"``). If ``None``,
            ``min_width_nm``, ``min_spacing_nm``, and ``pixel_size_nm`` must
            all be supplied explicitly.
        min_width_nm: Override for ``pdk.min_width_nm``.
        min_spacing_nm: Override for ``pdk.min_spacing_nm``.
        min_curvature_radius_nm: Minimum allowed local radius of curvature.
            Defaults to ``20 nm`` — looser than typical e-beam writer specs
            so the term doesn't dominate early in training.
        pixel_size_nm: Override for ``pdk.pixel_size_nm``.
        weight_min_cd: Weight for the min-CD term.
        weight_min_spacing: Weight for the min-spacing term.
        weight_min_curvature: Weight for the min-curvature term.

    Returns:
        Scalar ``torch.Tensor`` (autograd-connected). Zero on a fully
        rule-respecting mask, positive otherwise.
    """
    if mask.dtype not in (torch.float16, torch.float32, torch.float64):
        raise TypeError(f"curvilinear_mrc_loss requires a floating-point mask, got {mask.dtype}.")

    width_nm, spacing_nm, curvature_nm, pixel_nm = _resolve_rules(
        pdk, min_width_nm, min_spacing_nm, min_curvature_radius_nm, pixel_size_nm
    )
    if pixel_nm <= 0:
        raise ValueError(f"pixel_size_nm must be positive, got {pixel_nm}.")

    m, _ = _ensure_b1hw(mask)
    m = m.clamp(0.0, 1.0)

    radius_width = max(0, int(width_nm // (2.0 * pixel_nm)))
    radius_spacing = max(0, int(spacing_nm // (2.0 * pixel_nm)))

    # Loss is averaged per pixel per sample, then meaned over batch — keeps
    # the magnitude comparable across resolutions and batch sizes.
    spatial = float(m.shape[-1] * m.shape[-2])

    cd_term = m.new_zeros(())
    if weight_min_cd != 0.0 and radius_width >= 1:
        cd_residual = _opening_residual(m, radius_width)
        cd_term = cd_residual.flatten(1).sum(dim=1).mean() / spatial

    spacing_term = m.new_zeros(())
    if weight_min_spacing != 0.0 and radius_spacing >= 1:
        spacing_residual = _opening_residual(1.0 - m, radius_spacing)
        spacing_term = spacing_residual.flatten(1).sum(dim=1).mean() / spatial

    curvature_term = m.new_zeros(())
    if weight_min_curvature != 0.0 and curvature_nm > 0.0:
        # Sobel-like central differences in nm⁻¹ — divide finite differences
        # by ``pixel_nm`` so the gradient magnitude is in physical units and
        # the threshold ``1 / curvature_nm`` is directly meaningful.
        gy = (m[..., 2:, :] - m[..., :-2, :]) / (2.0 * pixel_nm)
        gx = (m[..., :, 2:] - m[..., :, :-2]) / (2.0 * pixel_nm)
        # Crop to the common interior so gx and gy align spatially.
        gy = gy[..., :, 1:-1]
        gx = gx[..., 1:-1, :]
        grad_sq = gy * gy + gx * gx

        # Boundary band: symmetric difference of 1-px dilation and erosion.
        # Detached so it acts as a soft mask, not a target — the gradient
        # signal flows through ``grad_sq``, not through where the band lives.
        with torch.no_grad():
            band = (_soft_dilation(m, 1) - _soft_erosion(m, 1))[..., 1:-1, 1:-1]
            band = band.clamp(0.0, 1.0)

        threshold = (1.0 / curvature_nm) ** 2
        excess = functional.relu(grad_sq - threshold)
        curvature_term = (excess * band).flatten(1).sum(dim=1).mean() / spatial

    return (
        weight_min_cd * cd_term
        + weight_min_spacing * spacing_term
        + weight_min_curvature * curvature_term
    )

Compliance

openlithohub.benchmark.compliance.mrc

Mask Rule Check (MRC) — minimum width/spacing for mask manufacturing.

MRCResult dataclass

Result of a Mask Rule Check.

.. note:: violation_count is the count of violating pixels, the sum of per-pixel boolean masks for width and spacing rules. It is unclipped and scales with the feature area of the layout — a 4096² mask with 1% violation density reports a bigger number than a 256² one with the same fractional rate. Use violation_rate (count / total pixels) for area- independent comparison.

MRC ``violation_count`` is **not directly comparable** to DRC
``violation_count`` — DRC counts connected components and is
clipped at the rule's ``max_reports`` cap, while MRC counts
pixels. ``passed`` / ``passed`` comparisons are well-defined;
magnitude comparisons are not.

``violations`` is a per-violation sample list (capped at
``max_reports``, evenly spaced) used for visualisation and
debug; do not derive counts from it — use ``violation_count``
directly.
Source code in src/openlithohub/benchmark/compliance/mrc.py
@dataclass
class MRCResult:
    """Result of a Mask Rule Check.

    .. note::
        ``violation_count`` is the count of **violating pixels**, the
        sum of per-pixel boolean masks for ``width`` and ``spacing``
        rules. It is unclipped and scales with the feature area of the
        layout — a 4096² mask with 1% violation density reports a
        bigger number than a 256² one with the same fractional rate.
        Use ``violation_rate`` (count / total pixels) for area-
        independent comparison.

        MRC ``violation_count`` is **not directly comparable** to DRC
        ``violation_count`` — DRC counts connected components and is
        clipped at the rule's ``max_reports`` cap, while MRC counts
        pixels. ``passed`` / ``passed`` comparisons are well-defined;
        magnitude comparisons are not.

        ``violations`` is a per-violation sample list (capped at
        ``max_reports``, evenly spaced) used for visualisation and
        debug; do not derive counts from it — use ``violation_count``
        directly.
    """

    passed: bool
    violation_count: int
    violation_rate: float
    violations: list[dict[str, float]]
    width_violation_count: int = 0
    spacing_violation_count: int = 0

    def _repr_html_(self) -> str:
        from openlithohub.jupyter._html import (
            kv_table,
            panel,
            pass_fail_badge,
            violation_table,
        )

        rows = [
            ("Total violations", str(self.violation_count)),
            ("Violation rate", f"{self.violation_rate:.4%}"),
            ("Width violations", str(self.width_violation_count)),
            ("Spacing violations", str(self.spacing_violation_count)),
        ]
        body = kv_table(rows) + violation_table(self.violations)
        return panel(title="MRC", header_html=pass_fail_badge(self.passed), body_html=body)

CurvilinearMRCResult dataclass

Result of a curvilinear-specific Mask Rule Check.

Curvilinear masks (post-ILT, EUV) cannot be validated with Manhattan-only rules. This adds two checks aimed at MBMW writability: - Minimum curvature radius (sharp cusps cannot be written). - Minimum feature area (sub-resolution dots cannot be reliably exposed).

Source code in src/openlithohub/benchmark/compliance/mrc.py
@dataclass
class CurvilinearMRCResult:
    """Result of a curvilinear-specific Mask Rule Check.

    Curvilinear masks (post-ILT, EUV) cannot be validated with Manhattan-only
    rules. This adds two checks aimed at MBMW writability:
    - Minimum curvature radius (sharp cusps cannot be written).
    - Minimum feature area (sub-resolution dots cannot be reliably exposed).
    """

    passed: bool
    violation_count: int
    curvature_violations: list[dict[str, float]] = field(default_factory=list)
    area_violations: list[dict[str, float]] = field(default_factory=list)
    min_radius_observed_nm: float | None = None
    min_area_observed_nm2: float | None = None

    def _repr_html_(self) -> str:
        from openlithohub.jupyter._html import (
            kv_table,
            panel,
            pass_fail_badge,
            violation_table,
        )

        def _fmt(v: float | None, suffix: str) -> str:
            return f"{v:.3g} {suffix}" if v is not None else "—"

        rows = [
            ("Total violations", str(self.violation_count)),
            ("Curvature violations", str(len(self.curvature_violations))),
            ("Area violations", str(len(self.area_violations))),
            ("Min radius observed", _fmt(self.min_radius_observed_nm, "nm")),
            ("Min area observed", _fmt(self.min_area_observed_nm2, "nm²")),
        ]
        body = kv_table(rows)
        if self.curvature_violations:
            body += '<div style="margin-top:6px;font-size:90%;color:#555;">Curvature</div>'
            body += violation_table(self.curvature_violations)
        if self.area_violations:
            body += '<div style="margin-top:6px;font-size:90%;color:#555;">Area</div>'
            body += violation_table(self.area_violations)
        return panel(
            title="Curvilinear MRC",
            header_html=pass_fail_badge(self.passed),
            body_html=body,
        )

check_mrc(mask, min_width_nm=40.0, min_spacing_nm=40.0, pixel_size_nm=1.0)

Check mask against minimum width and spacing rules.

MRC violations are a hard-fail metric — a mask that violates these rules cannot be manufactured regardless of optical performance.

Width check: morphological opening with structuring element of size kernel = floor(min_width_nm / pixel_size_nm) (i.e. the largest disk that physically fits inside a feature of exactly min_width_nm). The kernel half-width passed to binary_erosion is therefore (kernel - 1) // 2. Features that disappear under this opening are width violations. A feature exactly min_width_nm wide passes.

Spacing check: same logic on the inverted mask — gaps that disappear under opening are too narrow.

Parameters:

Name Type Description Default
mask Tensor

Binary mask tensor (H, W) or (B, C, H, W).

required
min_width_nm float

Minimum allowed feature width.

40.0
min_spacing_nm float

Minimum allowed spacing between features.

40.0
pixel_size_nm float

Physical pixel size for unit conversion.

1.0

Returns:

Type Description
MRCResult

MRCResult with pass/fail status and violation details.

Source code in src/openlithohub/benchmark/compliance/mrc.py
def check_mrc(
    mask: torch.Tensor,
    min_width_nm: float = 40.0,
    min_spacing_nm: float = 40.0,
    pixel_size_nm: float = 1.0,
) -> MRCResult:
    """Check mask against minimum width and spacing rules.

    MRC violations are a hard-fail metric — a mask that violates these rules
    cannot be manufactured regardless of optical performance.

    Width check: morphological opening with structuring element of size
    ``kernel = floor(min_width_nm / pixel_size_nm)`` (i.e. the largest disk
    that physically fits inside a feature of exactly ``min_width_nm``). The
    kernel half-width passed to ``binary_erosion`` is therefore
    ``(kernel - 1) // 2``. Features that disappear under this opening are
    width violations. A feature exactly ``min_width_nm`` wide passes.

    Spacing check: same logic on the inverted mask — gaps that disappear
    under opening are too narrow.

    Args:
        mask: Binary mask tensor (H, W) or (B, C, H, W).
        min_width_nm: Minimum allowed feature width.
        min_spacing_nm: Minimum allowed spacing between features.
        pixel_size_nm: Physical pixel size for unit conversion.

    Returns:
        MRCResult with pass/fail status and violation details.
    """
    m = ensure_2d(mask)
    binary = (m > 0.5).float()

    h, w = binary.shape
    total_pixels = h * w
    has_foreground = binary.sum() > 0
    has_background = (1.0 - binary).sum() > 0

    violations: list[dict[str, float]] = []

    radius_width = max(0, (int(math.floor(min_width_nm / pixel_size_nm)) - 1) // 2)
    radius_spacing = max(0, (int(math.floor(min_spacing_nm / pixel_size_nm)) - 1) // 2)

    width_violation_count = 0
    spacing_violation_count = 0

    if has_foreground and radius_width >= 1:
        opened = binary_dilation(binary_erosion(binary, radius=radius_width), radius=radius_width)
        width_violation_mask = (binary > 0.5) & (opened < 0.5)
        width_violation_count = int(width_violation_mask.sum().item())

        if width_violation_count > 0:
            fg_dist = distance_transform(binary)
            ys, xs = torch.where(width_violation_mask)
            _add_violations(violations, "width", ys, xs, fg_dist, pixel_size_nm, min_width_nm)

    if has_foreground and has_background and radius_spacing >= 1:
        bg = (binary < 0.5).float()
        eroded_bg = binary_erosion(bg, radius=radius_spacing)
        opened_bg = binary_dilation(eroded_bg, radius=radius_spacing)
        spacing_violation_mask = (bg > 0.5) & (opened_bg < 0.5)
        spacing_violation_count = int(spacing_violation_mask.sum().item())

        if spacing_violation_count > 0:
            bg_dist = distance_transform(bg)
            ys, xs = torch.where(spacing_violation_mask)
            _add_violations(violations, "spacing", ys, xs, bg_dist, pixel_size_nm, min_spacing_nm)

    violation_count = width_violation_count + spacing_violation_count
    violation_rate = violation_count / total_pixels if total_pixels > 0 else 0.0

    return MRCResult(
        passed=violation_count == 0,
        violation_count=violation_count,
        violation_rate=violation_rate,
        violations=violations,
        width_violation_count=width_violation_count,
        spacing_violation_count=spacing_violation_count,
    )

check_curvilinear_mrc(mask, min_curvature_radius_nm=20.0, min_feature_area_nm2=1600.0, pixel_size_nm=1.0, smoothing_window=5, max_reports=100)

Check curvilinear-specific manufacturing rules on a binary mask.

Two rules, both targeting MBMW writability of post-ILT curvilinear shapes:

  1. Minimum curvature radius. The contour is traced, smoothed with a periodic moving average to suppress rasterization aliasing, then discrete curvature is computed at each point via the Menger (three-point circumscribed circle) formula. A point violates if its radius (1/|kappa|) falls below min_curvature_radius_nm. The smoothing offset (smoothing_window // 2) skips evaluation near sharp 90 degree corners typical of Manhattan input, so right-angled designs do not falsely fail.
  2. Minimum feature area. 4-connected components below min_feature_area_nm2 are flagged as sub-resolution dots.

Parameters:

Name Type Description Default
mask Tensor

Binary mask tensor (H, W) or (B, C, H, W).

required
min_curvature_radius_nm float

Minimum allowed local radius of curvature.

20.0
min_feature_area_nm2 float

Minimum allowed area for a connected feature.

1600.0
pixel_size_nm float

Physical pixel size for unit conversion.

1.0
smoothing_window int

Window size for the periodic moving-average smoother. Set to 1 to disable smoothing. Larger values relax the curvature check; the default suits 1 nm/pixel ILT outputs.

5
max_reports int

Cap on per-category violation reports.

100

Returns:

Type Description
CurvilinearMRCResult

CurvilinearMRCResult with pass/fail status and violation details.

Source code in src/openlithohub/benchmark/compliance/mrc.py
def check_curvilinear_mrc(
    mask: torch.Tensor,
    min_curvature_radius_nm: float = 20.0,
    min_feature_area_nm2: float = 1600.0,
    pixel_size_nm: float = 1.0,
    smoothing_window: int = 5,
    max_reports: int = 100,
) -> CurvilinearMRCResult:
    """Check curvilinear-specific manufacturing rules on a binary mask.

    Two rules, both targeting MBMW writability of post-ILT curvilinear shapes:

    1. Minimum curvature radius. The contour is traced, smoothed with a periodic
       moving average to suppress rasterization aliasing, then discrete curvature
       is computed at each point via the Menger (three-point circumscribed
       circle) formula. A point violates if its radius (1/|kappa|) falls below
       ``min_curvature_radius_nm``. The smoothing offset (``smoothing_window // 2``)
       skips evaluation near sharp 90 degree corners typical of Manhattan input,
       so right-angled designs do not falsely fail.
    2. Minimum feature area. 4-connected components below
       ``min_feature_area_nm2`` are flagged as sub-resolution dots.

    Args:
        mask: Binary mask tensor (H, W) or (B, C, H, W).
        min_curvature_radius_nm: Minimum allowed local radius of curvature.
        min_feature_area_nm2: Minimum allowed area for a connected feature.
        pixel_size_nm: Physical pixel size for unit conversion.
        smoothing_window: Window size for the periodic moving-average smoother.
            Set to 1 to disable smoothing. Larger values relax the curvature
            check; the default suits 1 nm/pixel ILT outputs.
        max_reports: Cap on per-category violation reports.

    Returns:
        CurvilinearMRCResult with pass/fail status and violation details.
    """
    m = ensure_2d(mask)
    binary_torch = (m > 0.5).float()
    binary_np = binary_torch.detach().cpu().numpy().astype(np.int8)

    curvature_violations: list[dict[str, float]] = []
    area_violations: list[dict[str, float]] = []
    min_radius_observed: float | None = None
    min_area_observed: float | None = None

    pixel_area_nm2 = pixel_size_nm * pixel_size_nm
    components = _connected_component_areas(binary_torch)
    for area_px, cy, cx in components:
        area_nm2 = area_px * pixel_area_nm2
        if min_area_observed is None or area_nm2 < min_area_observed:
            min_area_observed = area_nm2
        if area_nm2 < min_feature_area_nm2 and len(area_violations) < max_reports:
            area_violations.append(
                {
                    "x_nm": cx * pixel_size_nm,
                    "y_nm": cy * pixel_size_nm,
                    "actual_nm2": area_nm2,
                    "required_nm2": min_feature_area_nm2,
                }
            )

    if min_curvature_radius_nm > 0.0:
        loops = trace_contour(binary_np)
        threshold_kappa = 1.0 / max(min_curvature_radius_nm, 1e-9)
        # Curvature stencil span. A 3-point Menger estimate is only reliable
        # when the sampled arc length is comparable to the radius being
        # measured. Span ~ R / pi keeps the chord/sagitta ratio sane and
        # prevents rasterization aliasing on smooth curves from flagging
        # spurious tight radii.
        stencil = max(2, int(round(min_curvature_radius_nm / max(pixel_size_nm, 1e-9) / 3.0)))
        skip = max(stencil, smoothing_window // 2)
        for loop in loops:
            if len(loop) < max(2 * skip + 3, 5):
                continue
            loop_nm = _smooth_loop(loop, smoothing_window) * pixel_size_nm
            n = len(loop_nm)
            # Vectorized Menger curvature over the whole closed loop.
            idx = np.arange(n)
            p0 = loop_nm[(idx - skip) % n]
            p1 = loop_nm[idx]
            p2 = loop_nm[(idx + skip) % n]
            a = np.linalg.norm(p1 - p0, axis=1)
            b = np.linalg.norm(p2 - p1, axis=1)
            c = np.linalg.norm(p2 - p0, axis=1)
            cross = (p1[:, 0] - p0[:, 0]) * (p2[:, 1] - p0[:, 1]) - (p1[:, 1] - p0[:, 1]) * (
                p2[:, 0] - p0[:, 0]
            )
            denom = a * b * c
            valid = (a >= 1e-9) & (b >= 1e-9) & (c >= 1e-9) & (np.abs(cross) >= 1e-12)
            kappa = np.zeros(n, dtype=np.float64)
            np.divide(2.0 * np.abs(cross), denom, out=kappa, where=valid)

            valid_kappa = kappa[valid]
            if valid_kappa.size > 0:
                loop_min_radius = 1.0 / float(valid_kappa.max())
                if min_radius_observed is None or loop_min_radius < min_radius_observed:
                    min_radius_observed = loop_min_radius

            (violator_indices,) = np.nonzero(kappa > threshold_kappa)
            for vi in violator_indices:
                if len(curvature_violations) >= max_reports:
                    break
                radius_nm = 1.0 / float(kappa[vi])
                curvature_violations.append(
                    {
                        "x_nm": float(p1[vi, 1]),
                        "y_nm": float(p1[vi, 0]),
                        "actual_radius_nm": radius_nm,
                        "required_radius_nm": min_curvature_radius_nm,
                    }
                )

    violation_count = len(curvature_violations) + len(area_violations)
    return CurvilinearMRCResult(
        passed=violation_count == 0,
        violation_count=violation_count,
        curvature_violations=curvature_violations,
        area_violations=area_violations,
        min_radius_observed_nm=min_radius_observed,
        min_area_observed_nm2=min_area_observed,
    )

openlithohub.benchmark.compliance.drc

Design Rule Check (DRC) — layout-level geometric constraint validation.

DRCRuleDeck dataclass

Configuration for DRC rules.

Source code in src/openlithohub/benchmark/compliance/drc.py
@dataclass
class DRCRuleDeck:
    """Configuration for DRC rules."""

    min_width_nm: float = 40.0
    min_spacing_nm: float = 40.0
    min_area_nm2: float = 100.0
    min_notch_nm: float = 30.0

DRCResult dataclass

Result of a Design Rule Check.

.. note:: violation_count is the number of reported violations, i.e. len(violations). Each rule check caps how many per-component reports it adds (typically max_reports = 50, evenly sampled), so on a heavily-violating layout this number is clipped, not the true total. Use rule_summary for the per-rule reported counts and treat passed (any violation at all) as the only sound binary signal.

DRC ``violation_count`` is **not directly comparable** to MRC
``violation_count`` — the latter counts violating *pixels*, an
unclipped scalar that scales with feature area, while DRC
counts (clipped) connected components. ``passed`` / ``passed``
comparisons are well-defined; magnitude comparisons are not.
Source code in src/openlithohub/benchmark/compliance/drc.py
@dataclass
class DRCResult:
    """Result of a Design Rule Check.

    .. note::
        ``violation_count`` is the **number of reported violations**,
        i.e. ``len(violations)``. Each rule check caps how many
        per-component reports it adds (typically ``max_reports`` = 50,
        evenly sampled), so on a heavily-violating layout this number
        is clipped, not the true total. Use ``rule_summary`` for the
        per-rule reported counts and treat ``passed`` (any violation
        at all) as the only sound binary signal.

        DRC ``violation_count`` is **not directly comparable** to MRC
        ``violation_count`` — the latter counts violating *pixels*, an
        unclipped scalar that scales with feature area, while DRC
        counts (clipped) connected components. ``passed`` / ``passed``
        comparisons are well-defined; magnitude comparisons are not.
    """

    passed: bool
    violation_count: int
    violations: list[dict[str, float]]
    rule_summary: dict[str, int] = field(default_factory=dict)

    def _repr_html_(self) -> str:
        from openlithohub.jupyter._html import (
            kv_table,
            panel,
            pass_fail_badge,
            violation_table,
        )

        rows: list[tuple[str, str]] = [("Total violations", str(self.violation_count))]
        rows.extend((rule, str(count)) for rule, count in sorted(self.rule_summary.items()))
        body = kv_table(rows) + violation_table(self.violations)
        return panel(title="DRC", header_html=pass_fail_badge(self.passed), body_html=body)

check_drc(mask, rule_deck='default', pixel_size_nm=1.0)

Run Design Rule Check on a mask layout.

Checks: minimum width, minimum spacing, minimum area, notch detection.

Notch semantics. min_notch_nm flags only fully-enclosed background concavities — small bg pockets surrounded on all sides by foreground — that a closing of the foreground at radius min_notch_nm / 2 would fill in. Through-channels (narrow bg gaps that touch the image border) and the open exterior background are intentionally excluded; those are spacing violations and are reported by min_spacing_nm instead. This split avoids double-counting the same physical defect under two rules.

Source code in src/openlithohub/benchmark/compliance/drc.py
def check_drc(
    mask: torch.Tensor,
    rule_deck: str | DRCRuleDeck = "default",
    pixel_size_nm: float = 1.0,
) -> DRCResult:
    """Run Design Rule Check on a mask layout.

    Checks: minimum width, minimum spacing, minimum area, notch detection.

    Notch semantics. ``min_notch_nm`` flags only **fully-enclosed** background
    concavities — small bg pockets surrounded on all sides by foreground —
    that a closing of the foreground at radius ``min_notch_nm / 2`` would fill
    in. Through-channels (narrow bg gaps that touch the image border) and the
    open exterior background are intentionally excluded; those are spacing
    violations and are reported by ``min_spacing_nm`` instead. This split
    avoids double-counting the same physical defect under two rules.
    """
    m = ensure_2d(mask)
    binary = (m > 0.5).float()

    if isinstance(rule_deck, str):
        if rule_deck not in _RULE_DECKS:
            raise ValueError(f"Unknown rule deck {rule_deck!r}. Available: {sorted(_RULE_DECKS)}")
        rules = _RULE_DECKS[rule_deck]
    else:
        rules = rule_deck

    violations: list[dict[str, float]] = []
    rule_summary: dict[str, int] = {}

    width_violations = _check_width(binary, rules.min_width_nm, pixel_size_nm)
    rule_summary["min_width"] = len(width_violations)
    violations.extend(width_violations)

    spacing_violations = _check_spacing(binary, rules.min_spacing_nm, pixel_size_nm)
    rule_summary["min_spacing"] = len(spacing_violations)
    violations.extend(spacing_violations)

    area_violations = _check_min_area(binary, rules.min_area_nm2, pixel_size_nm)
    rule_summary["min_area"] = len(area_violations)
    violations.extend(area_violations)

    notch_violations = _check_notch(binary, rules.min_notch_nm, pixel_size_nm)
    rule_summary["notch"] = len(notch_violations)
    violations.extend(notch_violations)

    violation_count = len(violations)
    return DRCResult(
        passed=violation_count == 0,
        violation_count=violation_count,
        violations=violations,
        rule_summary=rule_summary,
    )

Report

openlithohub.benchmark.report

Evaluation report generation.

generate_report(metrics, output_format='table')

Generate a formatted evaluation report from computed metrics.

Parameters:

Name Type Description Default
metrics dict[str, Any]

Dictionary of metric names to values.

required
output_format str

'table' (rich terminal), 'json', or 'markdown'.

'table'

Returns:

Type Description
str

Formatted report string.

Source code in src/openlithohub/benchmark/report.py
def generate_report(
    metrics: dict[str, Any],
    output_format: str = "table",
) -> str:
    """Generate a formatted evaluation report from computed metrics.

    Args:
        metrics: Dictionary of metric names to values.
        output_format: 'table' (rich terminal), 'json', or 'markdown'.

    Returns:
        Formatted report string.
    """
    if output_format == "json":
        return json.dumps(metrics, indent=2)

    if output_format == "markdown":
        return _format_markdown(metrics)

    return _format_table(metrics)