RFC 0006 — MCP (Model Context Protocol) Bridge¶
| Status | Draft |
| Author | OpenLithoHub maintainers |
| Created | 2026-05-21 |
| Targets | v0.4 |
| Related | openlithohub.cli.eval_cmd, openlithohub.cli.serve_cmd, openlithohub.benchmark.report, openlithohub.benchmark.compliance.{drc,mrc}, openlithohub.workflow.process_window, openlithohub.workflow.tiling, RFC 0003 (MRC Rule Deck Schema), RFC 0005 (Halo Sizing) |
Summary¶
Expose OpenLithoHub's evaluation and optimization surface to LLM agents (Claude Desktop, Cursor, custom MCP clients) through a thin Model Context Protocol bridge. The bridge is a JSON-pass-through: every structural decision lives in the CLI / engine, every framing decision lives in the MCP transport, and nothing in between gets to invent schema.
The wedge is conversational lithography verification — a fab engineer
asks "are there any DRC violations in the lower-left corner?" and the
agent calls evaluate(report_level=detailed), gets back a structured
violations[] array, and narrates the answer against the layout. The
follow-up optimize(violation_id=...) call closes the loop without the
agent re-deriving context from a free-text report.
This RFC defines three gates the bridge must clear before merge, the schema decisions the engine must implement first, and the control-flow contract that ties them together.
Background — what the bridge buys¶
Today, an agent that wants to use OpenLithoHub has two options:
- Shell out to
openlithohub eval, parse text output. Brittle — any CLI cosmetic change breaks the agent. - Call the FastAPI engine in
serve_cmd.py, parse a JSON response. Better, but the JSON shape is{metric_name: scalar}(a leaderboard row), not a violation list — agents cannot ground a "where is the violation" question in coordinates.
MCP gives us a third option: a typed tool surface (evaluate,
optimize, simulate) where every call has a JSON-Schema'd request
and response, the agent sees the schema at tools/list time, and
streaming/cancellation are part of the protocol rather than ad-hoc.
The catch is that MCP-over-stdio is fragile in ways that scientific
Python tooling routinely violates (stdout contamination, no abort
plumbing on long ops, opaque progressToken semantics). The bridge
must close those failure modes structurally, not by hoping callers
behave.
Current state (factual, verified 2026-05-21)¶
openlithohub eval --format=jsonexists insrc/openlithohub/cli/eval_cmd.py:31andsrc/openlithohub/benchmark/report.py:22. It emitsjson.dumps(aggregated_metrics, indent=2)— a flat{metric_name: scalar}dict. Noviolations[], no rule-IDs, no per-violation coordinates.- DRC/MRC violation records exist internally.
src/openlithohub/benchmark/compliance/mrc.py:27definesviolations: list[dict[str, float]]onMRCResult;drc.pyis parallel. The aggregator throws them away before serialization — the data is there, the contract isn't. - Tile loop in
src/openlithohub/workflow/process_window.py:89iteratesfor corner in corners:over process-window samples;src/openlithohub/workflow/tiling.py:142iteratesfor tile, result in tiles:. Neither loop checks a cancellation token — once started, anoptimizecall runs to completion. - No MCP server exists yet.
serve_cmd.pyonly boots a FastAPI engine; there is no stdio MCP transport in tree. - No JSON-Schema versioning. The
eval --format=jsonoutput has noschema_versionfield, no JSON-Schema document, no consumer-side validator.
Design¶
1 · Three frozen gates (G1, G2, G3)¶
The bridge does not merge until all three pass. Each gate has a falsifiable acceptance test.
G1 — stdio hygiene¶
MCP-over-stdio frames JSON-RPC on the server's stdout. Any stray
print(), tqdm progress bar, deprecation warning routed to stdout,
or rich-console output from the wrapped engine breaks the protocol —
the client sees malformed frames and disconnects.
Hard rules:
- The in-process path (preferred — lower latency, no subprocess
overhead) must redirect
openlithohub's logger handlers offsys.stdoutbefore installing the MCP stdio transport. Logging goes tostderror to a structured artifact path. The MCP transport ownsstdoutexclusively. - The subprocess path (fallback for engines that genuinely need
process isolation) must capture both
stdoutandstderrviasubprocess.run(..., stdout=PIPE, stderr=PIPE)— neverinherit. The wrapper parses child stdout intoviolations[]and forwards child stderr to the MCP server's own stderr or to artifacts. - Acceptance test (regression gate): run
openlithohub evalagainst a fixture and pipe its stdout to: Every line must parse as JSON. One failure = bridge rejected.
G2 — long-run control + cancellation¶
OPC tile loops run for tens of seconds to minutes. An LLM may fire
evaluate(), watch the partial stream, decide to pivot, and the engine
must actually stop — not just be ignored. Most MCP servers ship
streaming-out without a real abort path; the engine keeps burning GPU
on a request nobody reads.
Hard rules:
- Two long-op contracts, both supported:
- Path A —
progressToken: when the client supplies one in the request, the engine emits incremental progress notifications. Fast path for clients that implement it (Claude Desktop does; not all do). - Path B —
job_id+poll: the documented contract regardless of client.evaluate(...)returns ajob_idimmediately;poll(job_id)returns{status, progress, result?}. Clients that don't speakprogressTokenuse this path; clients that do can still use it (defensive plumbing). This sidesteps client variability entirely. - Cancel-on-disconnect is a P0 acceptance criterion. A
cancellation token threads through
workflow/process_window.pyandworkflow/tiling.py's tile loops; each iteration checks the token and the forward-model kernels (Hopkins SOCS) check it at a granularity that frees the GPU within ~1 s of the disconnect signal. - Acceptance test: start a long
optimize, kill the MCP client mid-stream, assert the engine process's GPU memory drops within ~2 s and the job's terminal state iscancelled(notcompleted, notrunning). Readback proof, not a log line.
G3 — structured-output contract at the CLI layer¶
The bridge does not parse anything. If the agent needs structured violations, that's a CLI/engine responsibility, not bridge middleware. This is the architectural constraint that decides whether the bridge stays a thin shim (good) or becomes a domain-translator with its own versioning problem (bad).
Hard rules:
openlithohub eval --report-level=detailed(new flag) extends the JSON contract to include aviolations[]array alongside the existing aggregate metrics. The leaderboard path (--report-level=aggregate, default) stays bit-identical to today's output — no leaderboard regression.- The bridge is a JSON-pass-through with a schema validator. The
MCP
evaluatetool's contract is "return whatever the engine emitted, after schema validation, unmodified." Any structural transformation is a CLI/engine responsibility. - Acceptance test: the same fixture round-trips identical bytes through three paths — direct CLI, MCP in-process bridge, MCP subprocess bridge — modulo only the MCP envelope. No string substitution, no re-keying, no float reformatting.
2 · Violation schema¶
The shape eval --report-level=detailed emits:
{
"schema_version": "1.0.0",
"metrics": {
"epe_mean_nm": 1.42,
"epe_max_nm": 4.91,
"mrc_pass_rate": 0.998,
"drc_pass_rate": 1.0
},
"violations": [
{
"violation_id": "v1a2b3c4d5e6f7g8h",
"violation_group_id": "9z8y7x6w5v4u3t2s",
"rule_id": "M1.W.MIN",
"rule_deck": "asap7-v1.2",
"severity": "error",
"coordinate_frame": "post_transform_canvas",
"location_nm": [12450.0, 8300.0],
"transforms": [{"op": "tile_stitch", "params": {"origin": [0, 0]}}],
"context": {"tile_id": "T_2_3", "tile_local_nm": [450.0, 300.0]},
"dedup_strategy": "canonical",
"dedup_provenance": {
"rule_deck_hash": "sha256:abc...",
"radius_nm": 18.0,
"rule_id": "M1.W.MIN"
}
}
]
}
2a · coordinate_frame enum¶
Free-form strings invite silent shifts of the fix to the wrong location. Pinned values:
| Value | Meaning |
|---|---|
input_layout_native |
User-supplied layout's reference frame, pre-rasterization, pre-tile, pre-rotation |
post_transform_canvas |
Engine working canvas, after internal rotation/mirror/tile-stitching |
tile_local |
Inside a specific tile (requires sibling tile_origin_nm in post_transform_canvas frame) |
wafer_global |
Reserved for future EUV pipelines that resolve to wafer coordinates |
The schema requires both the enum value and, where applicable, a
transforms: [{op, params}] chain so a downstream visualizer can
round-trip back to the user's original frame without guessing. Repair
agents that don't understand the transform chain are required to
reject the violation rather than silently shift the fix.
2b · violation_group_id derivation¶
Tile-boundary violations: when a defect straddles two tiles, each tile
emits its own record (the rule check runs per-tile; the rule deck
doesn't know the partitioning). Records that fall within
dedup_radius_nm of a sibling get the same violation_group_id.
Repair agents iterate violations[], group by violation_group_id,
and act only on the canonical record per group — the rest are reported
for traceability. This prevents the "second fix looks like a
regression introduced by the first" failure mode.
Derivation is deterministic from inputs, not opaque UUID:
violation_group_id = blake2b(
rule_deck_hash
+ dedup_radius_nm.to_bytes(...)
+ canonical_location_nm[0].to_bytes(...)
+ canonical_location_nm[1].to_bytes(...)
+ rule_id.encode()
).hexdigest()[:16]
Stable across runs given identical inputs. Stable under tile
re-partitioning (dedup runs after tiles are stitched in canvas
frame). Re-derivable by any consumer that has the rule deck pinned —
critical for repair agents resuming a workflow days later. The
dedup_provenance sibling on every record (not just canonical ones)
lets a stale agent audit-replay the partition without re-running eval.
Cross-vendor authority falls out: vendor-A and vendor-B rule decks
produce different group IDs because their rule_deck_hash differs,
even on byte-identical violation coordinates.
dedup_radius_nm is a per-rule property in the rule deck (DRC width
violations: ~min_feature_nm; MRC: 0 since they're per-edge not
per-region). The aggregation step runs before serialization — the
bridge sees pre-deduped records. No bridge-side dedup logic.
2c · dedup_strategy: canonical | shadow¶
Each record is one of two:
canonical— first record in its group;optimizeresolves to this record's geometry and fixes it.shadow— additional record(s) in the same group; reported for traceability so an agent can reason about "this defect was also observed from tile T2's perspective." Shadow records are not independent repair targets.
Default optimize(violation_id=...) resolution: regardless of whether
the supplied ID points to a canonical or shadow record, the engine
resolves to the group's canonical record and fixes that. The response
lists all group members as "resolved by group fix" so the agent's
traceability ledger stays consistent. Opt-out:
optimize(violation_id=v37, scope="record_only") for the rare case an
agent genuinely wants per-record geometry.
3 · Capabilities discovery¶
Agent planners cannot parse markdown at runtime. Per-engine-path
detail-level support is exposed structurally on the MCP tool descriptor
returned by tools/list:
{
"report_levels": ["aggregate", "detailed"],
"engine_paths": {
"drc.standard": {"detailed": "single_pass"},
"mrc.standard": {"detailed": "single_pass"},
"pvband.4corner": {"detailed": "rerun_required"},
"epe.gauge": {"detailed": "unsupported"},
"euv.stochastic": {
"detailed": "conditional",
"precondition": {
"layout_area_nm2_max": 5e10,
"violation_count_max": 50000,
"on_exceeded": "degrade_to_aggregate",
"signaled_via": "degraded_to_aggregate: true on response"
}
}
},
"schema_version": "1.0.0",
"max_violations_per_response": 10000
}
The four values:
| Value | Meaning |
|---|---|
single_pass |
Detailed output is free — DRC/MRC already build the list internally; the aggregator just doesn't throw it away |
rerun_required |
Detailed output requires a separate engine pass — e.g. PVB pipelines that throw away geometry between rule check and aggregator |
conditional |
Detailed output works below an explicit precondition (layout size, violation count); above the cutoff the engine signals degraded_to_aggregate: true and falls back. Not silent. |
unsupported |
The path cannot produce per-violation output without a structural change — the agent's planner must not ask |
Error-code-and-retry was rejected: it doubles the round-trip cost on
the expected path. report_level_unsupported still exists as a
fallback error for engine/descriptor drift (e.g. engine downgrade in
production), but it is defensive plumbing, not the documented
contract.
4 · schema_version policy¶
Asymmetric additionalProperties — the pattern every JSON-Schema
ecosystem converges on after a decade of pain.
- Response schemas (engine → bridge → agent):
additionalProperties: true. Adding a new violation field is a minor bump; pinned consumers ignore unknown keys; new consumers light up the new field. - Request schemas (agent → bridge → engine):
additionalProperties: false. Typo in a field name is a hard error, not silently ignored. The failure mode of "I calledevaluate(report_levle=detailed)and got default behavior" is exactly what this prevents. - Major bumps reserved for: removing/renaming fields, changing
field semantics (e.g. switching
location_nmfrom canvas to native frame without an enum), tightening enum values. Any of these breaks pinned consumers and deserves the major-bump signal. - Enum widening is
additionalProperties-style too. Consumers are required (in the schema doc, hard requirement) to handle anunknownfallback case —switch/matchmust have a default branch. Addingcoordinate_frame: wafer_globalin a future minor bump must not crash pinned consumers. schema_versionrides on every response payload, not just the tool descriptor — so even cached responses round-trip safely.
5 · Session semantics & violation_id resolution¶
When optimize consumes a violation_id returned from a prior
evaluate, two paths:
- Fast path (server-side cache). Session = MCP connection
lifetime, from
initializeto disconnect. Cache keyed by(session_id, violation_id) → (coordinate_frame, transforms[], canonical_location_nm, rule_deck_hash). ~megabytes per session, evicted on disconnect. - Cache-miss fallback (self-describing).
violation_idencodesschema_version || rule_deck_hash || violation_group_id || record_index(base32, ~40 chars). Oncache_evicted, the engine re-resolves from the ID alone provided the layout artifact is still pinned (the agent hintslayout_artifact_idon theoptimizecall). If the layout isn't pinned, the error islayout_not_pinnedrather thancache_evicted— different recovery paths.
Cross-session resumption works whenever
(rule_deck_hash, layout_artifact_id, violation_id) is reproducible —
which is the deterministic-derivation property from §2b. The session
cache is a latency optimization, not a correctness requirement.
6 · Control flow¶
┌──────────────────────────────────────────┐
│ MCP client (Claude Desktop / Cursor) │
└───────────────┬──────────────────────────┘
│ JSON-RPC over stdio
▼
┌───────────────────────────────────┐
│ MCP server (openlithohub-mcp) │ ── owns stdout
│ ─ stdio transport │
│ ─ schema validator (JSON-Schema) │
│ ─ session cache │
│ ─ progressToken / job_id router │
└───────────────┬───────────────────┘
│ validated request
▼
┌───────────────────────────────────┐
│ CLI / in-process engine adapter │ ── logger → stderr
│ ─ openlithohub eval / optimize │
│ ─ --report-level={agg, detailed} │
│ ─ cancellation token injection │
└───────────────┬───────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ Engine │
│ ┌────────────────────┐ ┌─────────────────────────┐ │
│ │ workflow/tiling │──▶│ Hopkins / SOCS forward │ │
│ │ for tile in ...: │ │ (cancel-token check) │ │
│ │ check_token() │ └─────────────────────────┘ │
│ └─────────┬──────────┘ │
│ ▼ │
│ ┌────────────────────────────────────────────────┐ │
│ │ benchmark/compliance/{drc,mrc} │ │
│ │ per-tile violations: list[dict[...]] │ │
│ └─────────┬──────────────────────────────────────┘ │
│ ▼ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Post-tile aggregator │ │
│ │ ─ stitch tiles in post_transform_canvas │ │
│ │ ─ dedup by (rule_id, radius_nm, location_nm) │ │
│ │ ─ assign violation_group_id (deterministic) │ │
│ │ ─ tag canonical | shadow │ │
│ └─────────┬──────────────────────────────────────┘ │
│ ▼ │
│ ┌────────────────────────────────────────────────┐ │
│ │ benchmark/report.generate_report │ │
│ │ ─ aggregate metrics (always) │ │
│ │ ─ violations[] (when --report-level=detailed)│ │
│ │ ─ schema_version stamp │ │
│ └─────────┬──────────────────────────────────────┘ │
└────────────┼───────────────────────────────────────────┘
│ JSON bytes (validated)
▼ ── cancel/disconnect short-circuits at every ▲
[ MCP server emits to client unmodified ]
6a · Cancellation pseudocode¶
The token threads through three layers; each one checks at a granularity that frees the GPU within ~1 s of disconnect.
# workflow/tiling.py — outer tile loop
def run_tiled(layout, model, halo_px, *, cancel_token):
tiles = tile_layout(layout, halo_px=halo_px)
results = []
for tile in tiles:
cancel_token.check() # boundary 1: per tile
result = model.predict(tile, cancel_token=cancel_token)
results.append(result)
return stitch(results)
# models/<engine>.py — kernel-level checkpoint
def predict(self, tile, *, cancel_token):
for corner in self.process_window_corners:
cancel_token.check() # boundary 2: per corner
aerial = simulate_aerial_image(...) # ~100ms-1s on GPU
# boundary 3: if simulate_aerial_image is itself >1s, it owns
# an internal token check before the GPU launch and after each
# batch — the kernel API takes cancel_token as a kwarg.
# bridge — token lifecycle tied to MCP request
def evaluate_handler(request, mcp_session):
token = CancellationToken()
mcp_session.on_disconnect(token.cancel)
mcp_session.on_cancel_notification(request.id, token.cancel)
try:
return run_eval(..., cancel_token=token)
except Cancelled:
return {"status": "cancelled", "job_id": request.job_id}
The token is a plain object, not asyncio-specific; the engine code
calls token.check() synchronously and raises Cancelled if set.
Async wrappers in the bridge translate to MCP semantics. No engine
code needs to know about MCP.
Hard constraints¶
- No bridge-side parsing or transformation. The bridge validates
schema and forwards bytes. Any structural change to the violation
shape lives in
benchmark/report.py, never in the bridge. - No leaderboard regression.
eval --report-level=aggregate(default) emits bit-identical output to today'seval --format=json. The leaderboard CI path is unchanged. - No
stdoutwrites from the engine when MCP-stdio is active. Logging routed tostderror artifacts, before the transport is installed. Verified by the JSON round-trip regression test. - Cancellation is structural, not advisory. A cancellation that doesn't free GPU within ~2 s is a bug, not a slow path. Verified by the GPU-memory readback acceptance test.
violation_group_idis deterministic from inputs. No opaque UUIDs. Cross-session resumption depends on this.- Schema versioning is asymmetric.
additionalProperties: trueon responses,falseon requests. Enum widening requires theunknownfallback contract. - The bridge ships behind a stdio-only MCP transport in v0.4. HTTP-streamable MCP is out of scope until v0.5 — different stdout-contamination story, different cancellation primitives.
Verification¶
Unit / contract tests¶
tests/test_bridge/test_schema.py:- JSON-Schema document validates a fixture
violations[]response. additionalProperties: falseon requests rejects unknown fields.additionalProperties: trueon responses accepts new fields.- Enum-widening fallback: a response with
coordinate_frame: wafer_globaldoes not crash a pinned consumer. tests/test_bridge/test_dedup.py:- 100 violations × 4 tiles × 12 group collisions: assert 12 distinct canonical records, 88 shadows, group IDs deterministic across runs.
optimize(violation_id=shadow_record)resolves to canonical group fix;scope="record_only"opts out.tests/test_bridge/test_capabilities.py:tools/listresponse includesengine_pathsmatrix.conditionalprecondition triggersdegraded_to_aggregate: truewhen layout area exceeds the cutoff.
G1 acceptance — stdio hygiene¶
openlithohub eval --report-level=detailed --fixture tests/fixtures/asap7_small.oas \
| python -c "import sys, json; [json.loads(l) for l in sys.stdin if l.strip()]"
Exit 0 = pass. One unparseable line = pass becomes fail; bridge merge
blocked. Runs in CI on every PR that touches cli/eval_cmd.py,
benchmark/report.py, or any engine path.
G2 acceptance — long-run + cancel¶
def test_cancel_frees_gpu():
job = mcp_client.call("evaluate", {"layout": large_fixture, "report_level": "detailed"})
time.sleep(0.5) # let engine start
mem_before = nvidia_smi_memory()
mcp_client.disconnect()
time.sleep(2.0)
mem_after = nvidia_smi_memory()
assert mem_after < mem_before - 100_000_000 # ≥100 MB freed
assert engine_job_status(job.job_id) == "cancelled"
CPU-only fallback for CI: assert process RSS drops, assert the tile
loop's iteration counter stops advancing, assert job state is
cancelled not running.
G3 acceptance — bridge is a pass-through¶
def test_byte_identity_through_paths():
fixture = "tests/fixtures/asap7_small.oas"
direct = subprocess.run(["openlithohub", "eval", "--report-level=detailed",
"--format=json", fixture], capture_output=True).stdout
in_proc = mcp_client_inproc.call("evaluate", {"layout": fixture, "report_level": "detailed"})
sub_proc = mcp_client_subproc.call("evaluate", {"layout": fixture, "report_level": "detailed"})
assert json.loads(direct) == in_proc["result"] == sub_proc["result"]
Modulo the MCP envelope, every path produces identical structured content. No string substitution. No re-keying. No float reformatting.
Falsifiable acceptance fixture (cited from issue #11)¶
tests/fixtures/mcp_acceptance/100x4tiles_12collisions.json:
- 100 violations across 4 tiles
- 12
violation_group_idcollisions - 2
scopevalues (group,record_only) - = 24 acceptance assertions on a single canned response
Replayable, falsifiable, exercises every failure surface this RFC addresses.
Out of scope¶
- HTTP-streamable MCP transport. Different stdout/stderr semantics, different cancellation primitives. v0.5 at earliest.
- Multi-session shared cache. Each MCP connection has its own
violation_idcache; no cross-connection reuse. Cross-session resumption is via deterministic re-derivation, not shared state. - Authenticated MCP servers. Local stdio assumes a trusted client (the user's own Claude Desktop / Cursor process). Network-exposed authenticated MCP is a separate RFC.
simulateandsynthtool surfaces. The first bridge shipsevaluateandoptimizeonly. Adding more tools is incremental and doesn't change the gates.- Streaming partial
violations[].evaluateis request/response; long ops usejob_id+poll, not partial-array streams. Partial streaming is an API-shape question revisited in v0.5. - Bridge-side observability. Tracing, metrics, and structured logging for the bridge process itself are punted to the same RFC as HTTP-streamable transport.
Implementation¶
Phased — the CLI prerequisite ships first, the bridge ships second. Both are independently useful (the CLI work helps non-MCP consumers like CI annotators and DRC visualizers).
Phase 1 — CLI prerequisite (blocks bridge)¶
src/openlithohub/cli/eval_cmd.py:--report-level={aggregate, detailed}flag. Defaultaggregate(no leaderboard regression).src/openlithohub/benchmark/report.py: emitviolations[]when detailed.schema_versionstamp on every JSON response. Aggregate path bit-identical to today.src/openlithohub/benchmark/compliance/{drc,mrc}.py: stop discarding per-violation records at the aggregator boundary. Addcoordinate_frame,transforms[],dedup_provenanceto each record.src/openlithohub/benchmark/compliance/dedup.py(new): post-tile aggregation, deterministicviolation_group_idderivation,canonical | shadowtagging.src/openlithohub/benchmark/compliance/rule_deck.py: per-rulededup_radius_nmfield.docs/api/violation-schema.md(new): JSON-Schema document, versioned.tests/test_benchmark/test_report_detailed.py: round-trip, dedup, group-ID determinism, bit-identical aggregate path.
Phase 2 — bridge (after Phase 1 lands)¶
src/openlithohub_mcp/(new package, in same repo for now):transport.py: stdio MCP transport, ownsstdout, redirects engine logger before install.tools.py:evaluate,optimizetool definitions, JSON-Schema validation.session.py: per-connection cache,violation_idresolution,cancel_evicted/layout_not_pinnederror codes.cancel.py:CancellationToken, MCP disconnect / cancel notification → token.capabilities.py: per-engine-path matrix, served ontools/list.src/openlithohub/workflow/{tiling,process_window}.py: threadcancel_tokenkwarg throughfor tile in .../for corner in ...loops. Token-aware kernels inmodels/{neural_ilt,levelset_ilt}.py.tests/test_bridge/: schema, dedup, capabilities, G1/G2/G3 acceptance gates, fixture-based falsifiable check.docs/cli-reference.md: document the MCP server entry point (openlithohub mcporpython -m openlithohub_mcp).
Phase 3 — client smoke transcripts (after Phase 2)¶
examples/mcp/claude-desktop-config.json: drop-in config for Claude Desktop.examples/mcp/transcripts/: replayable JSON-RPC transcripts for representative agent flows (eval → optimize, eval → cancel, eval → poll → optimize).docs/lithography-for-ai-engineers.md: link the transcripts as the worked example for "how an agent uses OpenLithoHub."
Acknowledgements¶
The schema decisions in this RFC were shaped by the design discussion on issue #11:
@m13v— flagged cancel-on-disconnect as the v2 rewrite vector; raised cancellation to a P0 acceptance gate (G2).@Ilya0527— stdio contamination as a hard rule (G1); the thin-shim-vs-domain-translator fork that froze G3; thecoordinate_frameenum + transform chain; tile-boundary dedup withviolation_group_idand the canonical/shadow distinction;conditionalas the fourth capabilities-matrix value; deterministic group-ID derivation; the falsifiable 100×4×12-collisions fixture.@reaworks-ops— the two-gate framing (stdio hygiene + long-run control) that organized the design before G3 was added.