Honest status

What works, what is measured, and what is still pending.

This page is the plain-English front door for GRIFF AI transparency. It says what is live, what is a public simulation, what still needs measurement, and where the raw engineering evidence lives.

Generated 2026-05-12T19:35:00-05:00 · source: Refreshed after the 2026-05-12 closeout using board-drain green status, Memory/Brain launch-root receipts, storage-context bridge receipts, website auditor findings, and the 5-person buyer focus-group synthesis. Original RC1 and Pliny fields remain snapshot-based until the post-S19b measured run is regenerated.

Public demo

Live

The governed mission theater uses sample data to show task, policy, approval, artifact, and verification flow.

App console

Private preview

Workspace access is gated. Public visitors should use the demo or request access.

Measurements

Mixed

Some tests and fixture counts are published. Some red-team and broker measurements are still pending.

Raw detail

Available

Engineering metrics, known limitations, source notes, and raw JSON are preserved below.

Engineering detail: RC1 freeze

Tag

rc1

Commit

8488a66f

Repo

github.com/griffin9899/v2-platform

Frozen at

2026-05-10T22:00:00-05:00

RC1 ships on the v2-platform repo. Annotated tag 'rc1' (object SHA 8488a66) points at commit 05e11a2 ('RC1 SHIP GATE: freeze spec + ADR #2 addendum + injection corpus deferral'). This control-plane monorepo (griff-ai-control-plane) is a separate surface and remains on branch control-plane-build-2026-05-06.

Engineering detail: prompt-injection and tool-abuse posture

MUST classes blocked

0 / 10 measured

SHOULD classes blocked

0 / 4 measured

Fixtures authored

3 / 14

Framework: Plan v2 §3 (14 classes: 10 MUST + 4 SHOULD). Ship threshold: 10/10 MUST blocked + >=3/4 SHOULD blocked.

Venue map: 8/10 (broker primary) + 2 delegated. TBD — SHOULDs #11-14 deferred to S22 (skill reflex compiler) and S18 (surprise segmenter calibration)

Measurement status: pending S19b broker eval — 8 of 10 MUST classes are broker-primary venue (1, 2, 3, 5, 6, 7, 8, 9 per S19b §2 matrix), 2 delegated to host-hardening (#4) and eval-harness (#10); SHOULDs #11-14 deferred to S22 / S18 calibration. No measured red-team run yet; fixtures are seed inputs, not pass/fail results.

Engineering detail: tests

Python (v2-platform)

321

pytest --collect-only -q (from A:/projects/v2-platform/)

as of 2026-05-10

Python (local-runner)

39

pytest --collect-only -q packages/local-runner

as of 2026-05-10

TypeScript (web suites)

7

packages/web/tests

as of 2026-05-10

v2-platform note: RC1 freeze snapshot (per spec): 275 passed / 17 failed (all 17 [d1]-marked with stale D1_ATOMIC_BATCH_TOKEN env var vs rotated worker secret — non-functional, env-refresh closes) / 4 skipped. 25 net-new tests have landed post-freeze (296 → 321 collected).

Known limitations

We publish what is red so customers do not have to discover it during procurement. Each item links to the runbook or audit where it is tracked.

L1

high

Hosted recall demo not yet public

app.griff.run/ does not yet front a public recall console. WEB-2 focus group identified this as the highest-leverage missing artifact. The current /demo route shows MASTER ATC custody theater, not memory recall.

Tracked in: WEB-2 synthesis + Sprint 7 design (S7-T002 hosted demo console with abuse controls)

L2

medium

/openapi.json behind CF Access

Public OpenAPI spec is currently gated by Cloudflare Access on memory.griff.run. Developer evaluators (Marcus, Bo, Priya in WEB-2) flagged this as a show-stopper.

Tracked in: WEB-2 Theme 5 — fold into Sprint 7 S7-T002 as a 30-minute add

L3

medium

Pliny red-team battery not yet measured end-to-end

3 of 14 fixtures authored (classes 1-3 wrapper-skip / TOCTOU / prompt-inject-judge). Broker enforcement venues mapped per S19b §2 — 8 broker-primary MUSTs, 2 delegated (class #4 keyring hardening, class #10 classifier mislabel via eval-harness). SHOULDs #11-14 are S22 / S18 surface, not broker. No measured run yet.

Tracked in: Sprint S19b (brain-S19b-mcp-host-broker-design.md) + N18 fixtures audit

L4

high

Real Execution Kernel: design only, sandbox primitive proven but not wired

P0c spike PASSED all 3 attacks (Job Object + cleared env + deny-DACL + WFP firewall, 2.6 ms spawn). Production integration deferred per V-6 Sprint 7 drop list — BD12 sandbox primitive integration is 6-9h of internal hygiene that doesn't produce customer surface; parked to S8 with ADR-007 deferred. ADR-006 dispatcher stub is what shipped for RC1.

Tracked in: phase0-P0c-sandbox-spike.md + V-6 Sprint 7 review §2.5 drops

L5

medium

Section 889 / Section 508 compliance: scaffolded, federal-conditional

S6-T003 shipped a 480 LOC scanner + CI workflow + sample.json + 4 tests (commit 8e8266b fixture exclude fix). Per Plan v2 §4 A10, full federal-customer-conditional checks are deferred unless a federal warm-intro materializes. NDAA 889 attestation generator is design-only beyond the scaffold.

Tracked in: rc1-rush-R3-section-889-shipped.md + master plan §4 A10 deferral

L6

medium

Memory federation cross-machine: post-RC1

Brain Plan v2 broker is single-host (S19b §0 explicit non-goal). Fleet federation across JWGH02 / GRIFFIN / JWGH03 is post-RC1. Memory recall today works cross-session on a single host via memory.griff.run; multi-host coherence is the next layer.

Tracked in: brain-S19b-mcp-host-broker-design.md §0 non-goals

L7

low

No published benchmarks vs mem0 / Letta / Anthropic built-in memory

Sam (journalist persona, WEB-2) flagged. Comparison harness not yet built; recall eval harness (A2) gated on P0d golden corpus lock.

Tracked in: WEB-2 divergent themes + Plan v2 A2 acceptance criterion

L8

low

17 [d1]-marked tests failing on stale token (non-functional)

RC1 freeze snapshot: 275 passed / 17 failed / 4 skipped. All 17 failures are D1 [d1]-marked tests with stale D1_ATOMIC_BATCH_TOKEN env var vs the rotated worker secret. Non-functional — env-refresh closes them. Honest disclosure rather than test-suppression.

Tracked in: RC1-integrated-mvp-build-spec.md §'Locked RC1 P0 acceptance thresholds'