Registry Services
The portfolio registry is the platform’s system of record for who owns
what: three small Hono microservices — customer-service, tenant-service,
and solution-service — that hold the Customer → Tenant → Solution model in
one Postgres database. They implement designs/tenant-model.md (v3, built and
shipped 2026-07-03) and are consumed exclusively by the
Aurora webapp acting as a backend-for-frontend (BFF).
The services are deliberately narrow. They own portfolio concerns only —
tenancy, lifecycle status, repo coordinates, deployment facts. The
methodology definition of a solution (name, description, trajectory version,
stages, owners) stays in the customer repo’s solution.yaml and is joined in
at read time by Aurora. The services never hold GitHub credentials.
| Service | Owns | Postgres schema | Local port | Azure app |
|---|---|---|---|---|
customer-service | Customer records | customer | 3021 | svc-customer |
tenant-service | Tenant records | tenant | 3022 | svc-tenant |
solution-service | Solution registry records + deployment write-back | solution | 3023 | svc-solution |
Each lives under implementation/<service>/ in the platform repo, following
the standard app convention (codebase/ + dockerfiles/), with the shared
plumbing factored into implementation/shared/registry-kit
(@maxq/registry-kit, a file: dependency like trajectory-loader). Each
codebase is intentionally tiny: src/index.ts (boot + serve), src/app.ts
(Hono routes), src/schema.ts (zod document schemas), src/config.ts (env),
and migrations/.
Domain model
The invariants behind this picture (enforced at write time, §5.3 of the design):
- A tenant belongs to exactly one customer (
tenant.customer— required, immutable, validated against customer-service on create). - A solution belongs to at most one tenant — a single nullable
tenantpointer on the solution record. The tenant’s solution list is derived (a solution-service query), so single-ownership holds by construction. customer.idis the org slug of solution naming; one slug rule governs customer ids, tenant slugs, and solution slugs (lowercase letters and hyphens, starts/ends with a letter, max 20 characters).- Ids and structural fields (
slug,customer, and on solutionsrepos) are immutable; display fields are editable. - Deletes are refused while children exist — a customer with tenants or a tenant with solutions answers 409. No cascades.
- Single authority per field: every field is owned by exactly one of
the registry record,
solution.yaml, or the live runtime. The joined object Aurora serves surfaces drift; it never silently resolves it.
Cross-service checks are synchronous HTTP calls that fail closed: if a
peer service is unreachable or unconfigured, the mutation is refused with 503
(PeerUnavailableError). Reads never need a peer. There are no distributed
transactions — Aurora’s reconciliation sweep is the drift backstop.
The Cosmos-shaped Postgres store
The data model is one Postgres server, one database (portfolio), one
schema per service plus a shared audit schema. Inside each service schema
sits a single container-shaped table — deliberately modelled on an Azure
Cosmos DB container, so escalating to real Cosmos (or to separate databases)
stays a connection-string change behind the DocumentStore interface:
-- <schema>.records — from each service's migrations/001-init.sql
CREATE TABLE IF NOT EXISTS {{schema}}.records (
id text PRIMARY KEY, -- the record key (Cosmos "id")
data jsonb NOT NULL, -- the document
version integer NOT NULL DEFAULT 1, -- etag analog; bumped every write
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);Each filterable JSON field gets a B-tree expression index — for example
(data->>'customer'), (data->>'tenant'), (data->>'status'), and on
solutions (data->'deployment'->>'state') — plus a jsonb_path_ops GIN
index for catch-all containment queries.
Document keys are camelCase — these are API/database documents, not hand-edited YAML; the hyphenated house style stays with the artefacts humans edit. A customer record looks like this (sample from the design doc, §4.1):
{
"id": "cronos",
"name": "Cronos Wallonia",
"legalName": "Cronos Wallonie SA",
"description": "Regional IT group; first Trajectory customer.",
"status": "active",
"contacts": [
{ "name": "J. Dupont", "email": "[email protected]", "role": "sponsor" }
],
"tags": ["belgium", "public-sector"],
"createdAt": "2026-07-02T09:00:00Z",
"updatedAt": "2026-07-02T09:00:00Z"
}Over the API, a record is served flattened: the data document spread
alongside id, version, createdAt, and updatedAt (registry-kit’s
serializeRecord).
Optimistic concurrency: If-Match
Every mutation of an existing record (PATCH, DELETE, and the deployment PUT)
must carry an If-Match: <version> header naming the record’s current
version. The underlying UPDATE/DELETE only lands when the stored version
still matches; otherwise the service answers 409 with a “re-read and
retry” message. A missing or malformed If-Match is a 400. Successful writes
bump version and return the new value in the body.
The audit trail
Every mutation appends one row to the shared, append-only audit.events
table (created idempotently by registry-kit’s ensureAudit at boot):
CREATE TABLE IF NOT EXISTS audit.events (
seq bigserial PRIMARY KEY,
at timestamptz NOT NULL DEFAULT now(),
service text NOT NULL,
actor text NOT NULL, -- from the x-actor header; "unknown" if absent
action text NOT NULL, -- create | update | delete
record_id text NOT NULL,
before jsonb,
after jsonb
);The actor is propagated by the BFF via the x-actor request header. The audit
table is the deliberate replacement for the git history an earlier
registry-in-a-repo design would have provided for free.
Not to be confused with the platform’s
Activity Service: audit.events is the
low-level forensic before/after record of registry mutations only, never read
by any UI; the activity feed is the human-readable, platform-wide business
narrative, explicitly emitted by producers and deliberately not derived
from this table.
Boot-time migrations
Services run their own migrations at startup: plain .sql files in the
service’s migrations/ directory, applied in name order under a Postgres
advisory lock (safe with multiple replicas), with {{schema}} templated to
the service’s schema and applied file names recorded in
<schema>.migrations. A docker compose up from zero therefore yields a
working empty registry — no separate migration step exists.
HTTP APIs
All three services share the same conventions: JSON bodies, errors as
{ "error": "..." } with 400/404/409/503, zod validation on every write
(.strict() schemas — unknown keys are rejected, which is also how
immutability surfaces: patch schemas simply don’t list immutable fields), and
an unauthenticated GET /health that includes a real database ping.
customer-service
| Route | Behaviour |
|---|---|
GET /health | { ok, service, db, ts }; 503 when the db ping fails. |
GET /customers | List; optional ?status= filter (validated against the enum). |
POST /customers | Create (201). Caller supplies id (slug). Duplicate id → 409. |
GET /customers/{id} | One record, or 404. |
PATCH /customers/{id} | Mutable fields: name, legalName, description, status, contacts, tags. Requires If-Match. |
DELETE /customers/{id} | Requires If-Match. Refused with 409 while tenant-service reports tenants for this customer; tenant-service unreachable → 503. |
tenant-service
| Route | Behaviour |
|---|---|
GET /health | As above. |
GET /tenants | List; optional ?customer= and ?status= filters. |
POST /tenants | Create (201). The id is derived <customer>-<slug> — never supplied. The owning customer must exist (fail-closed check against customer-service; unknown → 400). Defaults: orbit: { state: "not-deployed" }, auth: {}. |
GET /tenants/{id} | The registry record only — the solutions-resolved view is the BFF’s job. |
PATCH /tenants/{id} | Mutable fields: name, description, status, kind, orbit, auth, contacts, tags; slug and customer are immutable (400). Requires If-Match. |
DELETE /tenants/{id} | Requires If-Match. Refused with 409 while solution-service reports solutions pointing at it. |
solution-service
| Route | Behaviour |
|---|---|
GET /health | As above. |
GET /solutions | List; filters ?customer=, ?tenant=, ?status=, and ?unassigned=true (records whose tenant is null). |
POST /solutions | Register a record (201) — used by the creation flow and to backfill unregistered repo pairs. Id derived <customer>-<slug>; the customer must exist; a non-null tenant must exist and belong to the same customer (wrong owner → 409 naming it). |
GET /solutions/{id} | One record, or 404. |
PATCH /solutions/{id} | status, tags, deployment corrections, and attach/detach via the tenant pointer (null = detach; non-null targets validated as on create). slug, customer, and repos are immutable (400). Requires If-Match. |
PUT /solutions/{id}/deployment | The deploy write-back: replaces the whole deployment block. On state: "deployed" the service stamps deployedAt (if absent) and clears any previous error. Requires If-Match. |
Attach/detach lives on the solution record — invariant 2’s single writable pointer. Aurora’s tenant-scoped attach routes are conveniences that forward here.
The solution record’s deployment block stores facts seeded by the deploy
path, not values derived at read time: mode (dedicated or tenant-orbit),
state (not-deployed / provisioning / deployed / failed / retired),
the two public URLs (orbitUrl, missionControlUrl — since 2026-07-03 no
agentUrl is written, the agent being internal-only, though legacy records
may still carry the field), the Azure name
hash (azure.hash, twelve hex chars), deployedAt, and the last error when
failed.
What registry-kit factors out
implementation/shared/registry-kit (@maxq/registry-kit) holds the shared
machinery, so each service’s own code is little more than its routes and
document schemas:
| Module | Provides |
|---|---|
store.ts | DocumentStore — get/list/insert/replace/delete over <schema>.records with If-Match version guards; Filter/FilterField (SQL is only ever built from code constants, never from request strings); serializeRecord; typed ConflictError / VersionConflictError / NotFoundError. |
pool.ts | createPool — the dual-mode pg pool: DATABASE_URL locally, Entra token auth on Azure (see below). |
migrate.ts | applyMigrations (advisory-locked, {{schema}}-templated boot migrations) and ensureAudit (idempotent creation of audit.events). |
audit.ts | writeAudit — the append-only event writer. |
middleware.ts | serviceGuard (the dormant shared-secret guard), healthPayload (db-pinging health body), requireIfMatch, actorOf (the x-actor header), and storeErrorResponse (store error → { error } + status mapping). |
peer.ts | PeerClient — the fail-closed HTTP helper for cross-service invariant checks: returns 2xx/404 results, throws PeerUnavailableError on anything else (no config, network failure, timeout, 5xx), which routes turn into a 503 refusal. |
Local-stack gotcha: registry-kit is a file: dependency baked into the
local service images at build time. The compose bind-mount + tsx watch
hot-reload only covers each service’s own codebase/ — after changing
registry-kit you must rebuild the three service images
(docker compose up -d --build of the three services).
Authentication and trust
Database access is Microsoft Entra-only on Azure — no Postgres password
exists anywhere. The Flexible Server is created with password auth
disabled; the platform’s user-assigned managed identity (id-aurora) and
the signed-in operator are its Entra administrators. registry-kit’s
createPool implements both worlds:
- Local:
DATABASE_URL— ordinary password auth against the compose Postgres container. - Azure:
DATABASE_HOST+PG_USER(the UAMI name) — the pool’spasswordis an async callback that fetches aDefaultAzureCredentialtoken for theossrdbms-aadscope per new physical connection, so tokens refresh themselves and no restart is needed at expiry. The Azure SDK is lazily imported, so the local path carries no Azure dependency at runtime.
Service-to-service and BFF-to-service trust is primarily network
isolation: internal-only ingress on Azure Container Apps, docker-network
only locally (host ports 3021–3023 are published purely for curl during
development). On top of that sits a dormant shared-secret guard
(SERVICE_SHARED_SECRET, sent as the x-service-secret header): active only
when the env var is set, always exempting /health — defense in depth, off
by default. End-user authentication is a front-of-Aurora concern; the
services never see it.
How Aurora consumes the registry
Aurora’s src/lib/registry/ is the only code that talks to the services
(base URLs from CUSTOMER_SERVICE_URL / TENANT_SERVICE_URL /
SOLUTION_SERVICE_URL; the whole feature is gated by
AURORA_TENANTS_ENABLED). In brief — the full story belongs on the
Aurora page:
- The join (
join.ts): registry record ×solution.yaml, keyed by the customer-repo name (so a registry id may differ from a legacy yaml id — e.g. acronos-wallonia-monwbirecord joining a yaml that saysid: monwbi). A record whose repo no longer appears in the GitHub org scan is flaggedorphaned. - The reconciliation sweep (
?sweep=true): compares the GitHub org scan withGET /solutions— repos without records are unregistered (one-click backfill viaPOST /solutions), records without repos are orphaned. Visible and healable, never auto-resolved. - BFF API routes:
/api/customers,/api/tenants,/api/solutions(+[id]variants).GET /api/tenants/{id}serves the tenant with its solutions resolved — the contract the tenant-scoped Orbit reader consumes (the “B2” endpoint). Attach/detach is exposed under/api/tenants/[id]/solutions/[solutionId]; the BFF’s attach additionally refuses a solution already attached elsewhere (409 naming the owner) and exposes deployment corrections via PATCH only. - Creation and deploy integration: with the registry enabled,
createSolutionrequires a tenant (validated before forking), registers the record after seeding (a failure is a non-fatalregisterError), and the deploy path writes the deployment record back viaPUT /solutions/{id}/deployment(write-back.ts— per-solution Cloudflare URLs when a zone is configured, raw ACA URLs otherwise). ACA resources are tagged withcustomer/tenant.
Deployment
Local
The unified compose stack (infrastructure/local/docker-compose.yml) runs
postgres:17-alpine (database portfolio, volume orbit-pg-data, health
check via pg_isready) plus the three services on host ports
3021 / 3022 / 3023. Each service gets its DATABASE_URL, its
PG_SCHEMA, and exactly the peer URLs it consults (docker-network names such
as http://tenant-service:3000); services depend on the Postgres health
check and migrate themselves on boot. aurora-webapp receives the three
service URLs and AURORA_TENANTS_ENABLED.
Azure
Two deploy.sh verbs (in infrastructure/azure/), both GET-first and
rerunnable:
deploy.sh postgres→provision-postgres.sh: an Azure Database for PostgreSQL Flexible Server (psql-orbitby default; BurstableStandard_B1ms, 32 GiB, pg 16), databaseportfolio, Entra-only auth (--microsoft-entra-auth Enabled --password-auth Disabled), created with--public-access 0.0.0.0to seed the allow-Azure-services firewall rule (VNet integration is the documented escalation). Schemas and tables are not created here — the services’ boot migrations own that.deploy.sh services→provision-services.sh: renders one template (app-service.yaml.tmpl) three times into the ACA appssvc-customer,svc-tenant,svc-solution— internal ingress, port 3000,minReplicas: 0(stateless, scale-to-zero-safe), 0.25 vCPU / 0.5 Gi, UAMI attached for both ACR pull and the Postgres token. Images come from the shared ACR viabuild-images.sh, which also writes thetarget: containerrelease records underreleases/<component>/.
Order on a fresh environment: bootstrap → postgres → build-images ×3 →
services → aurora.
Live-run gotchas (paid for during the 2026-07-03 production rollout):
AZURE_CLIENT_IDmust be set on the service apps.DefaultAzureCredentialwill not pick a user-assigned identity without it — IMDS defaults to system-assigned and fails. The template injects the UAMI client id explicitly.- In-env service URLs use the app-name form (
http://svc-customer). Thehttps://svc-x.internal.env-domainFQDN form does not resolve on this environment — the BFF timed out until the switch. Both provisioners now render app-name URLs. - Several
azCLI flags were renamed (--microsoft-entra-auth, not--active-directory-auth;microsoft-entra-admin, notad-admin), and--public-access Nonenow fully disables public network access. Fresh subscriptions need theMicrosoft.DBforPostgreSQLresource provider registered (bootstrap.shdoes it). az acr buildcan transiently fail with a stale server-side upload-SAS error (“Signed expiry time … must be after signed start time”) — a plain retry succeeds.
Production state
The one-time portfolio migration (design §11) was executed against production
on 2026-07-03: 7 customers, 7 default <customer>-main tenants, and 7
solution records all attached; deployment records were seeded for the two
live stacks (fit-my-fit-portal and smals-sas-poc), and the final
reconciliation sweep reported 0 unregistered / 0 orphaned. The registry is
ON in production (aurora.tenants_enabled: true in the Azure config).