Skip to content

Architecture Decisions

When to add an ADR: if you’d put it in CLAUDE.md, put it here instead.

Last updated: 2026-05-31

Title: All APIs must be exposed via dedicated CF Worker gateways, not directly from the core

Context: The Rust backend (finstack-rs) runs on Fly.io with a public URL (finstack-api.fly.dev). The current architecture lets the dashboard and external callers hit the Fly container directly. This exposes the container URL to the open internet, bypasses the Cloudflare edge (DDoS protection, WAF, geo routing, caching), and forces cross-cutting concerns (auth pre-flight, rate limiting, tenant routing) to live inside the Rust core rather than at the edge where they belong.

Decision: Every API surface — public REST (/v1/*), admin REST (/admin/*), MCP (POST|GET /mcp), OAuth endpoints, and inbound webhooks — must be fronted by a dedicated Cloudflare Worker gateway. The Fly container must only accept traffic from those Workers (enforced via a shared X-Gateway-Token secret header or Fly private networking). No consumer — dashboard, SDK users, Stripe, third-party integrations — may call the Fly URL directly in production.

This applies equally to internal service-to-service APIs (e.g. the orchestration sweeper calling a primitive): internal callers must go through a Worker gateway or use Fly private networking, never the public Fly URL.

Consequences:

  • Each public API surface gets its own CF Worker (or a single multiplex gateway worker that routes by path prefix). The gateway handles TLS termination, auth header forwarding, rate limiting, and geo logic; the core stays dumb about CF-specific concepts.
  • The Fly container’s public URL becomes an internal implementation detail. It must be protected: either restrict inbound via Fly’s private network (internal address) or validate X-Gateway-Token on every request and return 403 if absent.
  • Aligns with the existing finstack-marketing Worker pattern — extends it to the API layer.
  • Stripe webhooks: Stripe calls a public URL. The gateway Worker handles HMAC pre-validation (or forwards the raw headers) before proxying to the core. The Fly container still validates Stripe-Signature — defence in depth.
  • Dashboard → backend calls: getOrProvisionApiKey() in backend-auth.ts must target the CF gateway URL, not finstack-api.fly.dev.

Rule: See RULES.md worker-gateway-only. Hard rule — no exceptions without a new ADR.

Open work: Gateway Worker(s) not yet built. Current direct-Fly calls are a known gap to resolve before Phase 6 cutover.


Title: Derive status_changed_at instead of storing it

Context: Several domain rows (payfac.sub_merchants, compliance.kyc_checks, workflow.workflow_runs) already store per-transition timestamp columns (activated_at, suspended_at, closed_at, reviewed_at, decided_at, started_at, finished_at). Operators want a single status_changed_at field to answer “how long has this row been in its current status?” without crawling event_outbox.

Decision: Don’t add a status_changed_at column. Instead, derive it at row hydration in Rust from the per-transition timestamps already on the row. Each model exposes a derive_status_changed_at(&mut self) method; service-layer row loaders call it after every SELECT. The field carries #[serde(default = "chrono::Utc::now")] so the struct stays round-trippable through JSON without forcing API consumers to supply a value they can’t influence anyway (caller-provided values are silently overwritten on the next hydration).

Consequences:

  • No migration per domain; no write-path discipline burden (no risk of a transition path forgetting to bump the column).
  • ~10ns CPU cost per hydrated row on the read path. Negligible.
  • The derivation logic is duplicated across each model’s derive_status_changed_at. Acceptable for now (3 sites, each short, each domain-shaped). If we hit 5+ identical implementations, extract a trait or proc-macro.
  • A future migration that hand-edits one of the *_at columns without going through a service-layer transition will silently flow into status_changed_at on the next read — that’s the desired behavior (the timestamps remain the source of truth).

Rule: none (judgment call applied per-domain, not a hard rule)

Sites (Rust derivation): svc-payfac::SubMerchant, svc-compliance::KycCheck, svc-workflow::WorkflowRun, svc-orchestration::OrchestrationRun (4th application — ADR-1).

Sites (SQL CASE status WHEN ... THEN ts END mirror): the same per-domain mapping appears inline in stale-list filter queries — svc-payfac::list (stale_for_secs), svc-compliance::list_stale, svc-workflow::list_stale_runs, svc-orchestration::stale_runs. These must stay in sync with the Rust derivation. Counted as one application per domain (not one per language) for the 5-site threshold: a future Rust-derive-only domain still counts, even without an SQL mirror.