Skip to Content
ApplicationsRegistry Services

Registry Services

The portfolio registry is the platform’s system of record for who owns what: three small Hono microservices — customer-service, tenant-service, and solution-service — that hold the Customer → Tenant → Solution model in one Postgres database. They implement designs/tenant-model.md (v3, built and shipped 2026-07-03) and are consumed exclusively by the Aurora webapp acting as a backend-for-frontend (BFF).

The services are deliberately narrow. They own portfolio concerns only — tenancy, lifecycle status, repo coordinates, deployment facts. The methodology definition of a solution (name, description, trajectory version, stages, owners) stays in the customer repo’s solution.yaml and is joined in at read time by Aurora. The services never hold GitHub credentials.

ServiceOwnsPostgres schemaLocal portAzure app
customer-serviceCustomer recordscustomer3021svc-customer
tenant-serviceTenant recordstenant3022svc-tenant
solution-serviceSolution registry records + deployment write-backsolution3023svc-solution

Each lives under implementation/<service>/ in the platform repo, following the standard app convention (codebase/ + dockerfiles/), with the shared plumbing factored into implementation/shared/registry-kit (@maxq/registry-kit, a file: dependency like trajectory-loader). Each codebase is intentionally tiny: src/index.ts (boot + serve), src/app.ts (Hono routes), src/schema.ts (zod document schemas), src/config.ts (env), and migrations/.

Domain model

The invariants behind this picture (enforced at write time, §5.3 of the design):

  1. A tenant belongs to exactly one customer (tenant.customer — required, immutable, validated against customer-service on create).
  2. A solution belongs to at most one tenant — a single nullable tenant pointer on the solution record. The tenant’s solution list is derived (a solution-service query), so single-ownership holds by construction.
  3. customer.id is the org slug of solution naming; one slug rule governs customer ids, tenant slugs, and solution slugs (lowercase letters and hyphens, starts/ends with a letter, max 20 characters).
  4. Ids and structural fields (slug, customer, and on solutions repos) are immutable; display fields are editable.
  5. Deletes are refused while children exist — a customer with tenants or a tenant with solutions answers 409. No cascades.
  6. Single authority per field: every field is owned by exactly one of the registry record, solution.yaml, or the live runtime. The joined object Aurora serves surfaces drift; it never silently resolves it.

Cross-service checks are synchronous HTTP calls that fail closed: if a peer service is unreachable or unconfigured, the mutation is refused with 503 (PeerUnavailableError). Reads never need a peer. There are no distributed transactions — Aurora’s reconciliation sweep is the drift backstop.

The Cosmos-shaped Postgres store

The data model is one Postgres server, one database (portfolio), one schema per service plus a shared audit schema. Inside each service schema sits a single container-shaped table — deliberately modelled on an Azure Cosmos DB container, so escalating to real Cosmos (or to separate databases) stays a connection-string change behind the DocumentStore interface:

-- <schema>.records — from each service's migrations/001-init.sql CREATE TABLE IF NOT EXISTS {{schema}}.records ( id text PRIMARY KEY, -- the record key (Cosmos "id") data jsonb NOT NULL, -- the document version integer NOT NULL DEFAULT 1, -- etag analog; bumped every write created_at timestamptz NOT NULL DEFAULT now(), updated_at timestamptz NOT NULL DEFAULT now() );

Each filterable JSON field gets a B-tree expression index — for example (data->>'customer'), (data->>'tenant'), (data->>'status'), and on solutions (data->'deployment'->>'state') — plus a jsonb_path_ops GIN index for catch-all containment queries.

Document keys are camelCase — these are API/database documents, not hand-edited YAML; the hyphenated house style stays with the artefacts humans edit. A customer record looks like this (sample from the design doc, §4.1):

{ "id": "cronos", "name": "Cronos Wallonia", "legalName": "Cronos Wallonie SA", "description": "Regional IT group; first Trajectory customer.", "status": "active", "contacts": [ { "name": "J. Dupont", "email": "[email protected]", "role": "sponsor" } ], "tags": ["belgium", "public-sector"], "createdAt": "2026-07-02T09:00:00Z", "updatedAt": "2026-07-02T09:00:00Z" }

Over the API, a record is served flattened: the data document spread alongside id, version, createdAt, and updatedAt (registry-kit’s serializeRecord).

Optimistic concurrency: If-Match

Every mutation of an existing record (PATCH, DELETE, and the deployment PUT) must carry an If-Match: <version> header naming the record’s current version. The underlying UPDATE/DELETE only lands when the stored version still matches; otherwise the service answers 409 with a “re-read and retry” message. A missing or malformed If-Match is a 400. Successful writes bump version and return the new value in the body.

The audit trail

Every mutation appends one row to the shared, append-only audit.events table (created idempotently by registry-kit’s ensureAudit at boot):

CREATE TABLE IF NOT EXISTS audit.events ( seq bigserial PRIMARY KEY, at timestamptz NOT NULL DEFAULT now(), service text NOT NULL, actor text NOT NULL, -- from the x-actor header; "unknown" if absent action text NOT NULL, -- create | update | delete record_id text NOT NULL, before jsonb, after jsonb );

The actor is propagated by the BFF via the x-actor request header. The audit table is the deliberate replacement for the git history an earlier registry-in-a-repo design would have provided for free.

Not to be confused with the platform’s Activity Service: audit.events is the low-level forensic before/after record of registry mutations only, never read by any UI; the activity feed is the human-readable, platform-wide business narrative, explicitly emitted by producers and deliberately not derived from this table.

Boot-time migrations

Services run their own migrations at startup: plain .sql files in the service’s migrations/ directory, applied in name order under a Postgres advisory lock (safe with multiple replicas), with {{schema}} templated to the service’s schema and applied file names recorded in <schema>.migrations. A docker compose up from zero therefore yields a working empty registry — no separate migration step exists.

HTTP APIs

All three services share the same conventions: JSON bodies, errors as { "error": "..." } with 400/404/409/503, zod validation on every write (.strict() schemas — unknown keys are rejected, which is also how immutability surfaces: patch schemas simply don’t list immutable fields), and an unauthenticated GET /health that includes a real database ping.

customer-service

RouteBehaviour
GET /health{ ok, service, db, ts }; 503 when the db ping fails.
GET /customersList; optional ?status= filter (validated against the enum).
POST /customersCreate (201). Caller supplies id (slug). Duplicate id → 409.
GET /customers/{id}One record, or 404.
PATCH /customers/{id}Mutable fields: name, legalName, description, status, contacts, tags. Requires If-Match.
DELETE /customers/{id}Requires If-Match. Refused with 409 while tenant-service reports tenants for this customer; tenant-service unreachable → 503.

tenant-service

RouteBehaviour
GET /healthAs above.
GET /tenantsList; optional ?customer= and ?status= filters.
POST /tenantsCreate (201). The id is derived <customer>-<slug> — never supplied. The owning customer must exist (fail-closed check against customer-service; unknown → 400). Defaults: orbit: { state: "not-deployed" }, auth: {}.
GET /tenants/{id}The registry record only — the solutions-resolved view is the BFF’s job.
PATCH /tenants/{id}Mutable fields: name, description, status, kind, orbit, auth, contacts, tags; slug and customer are immutable (400). Requires If-Match.
DELETE /tenants/{id}Requires If-Match. Refused with 409 while solution-service reports solutions pointing at it.

solution-service

RouteBehaviour
GET /healthAs above.
GET /solutionsList; filters ?customer=, ?tenant=, ?status=, and ?unassigned=true (records whose tenant is null).
POST /solutionsRegister a record (201) — used by the creation flow and to backfill unregistered repo pairs. Id derived <customer>-<slug>; the customer must exist; a non-null tenant must exist and belong to the same customer (wrong owner → 409 naming it).
GET /solutions/{id}One record, or 404.
PATCH /solutions/{id}status, tags, deployment corrections, and attach/detach via the tenant pointer (null = detach; non-null targets validated as on create). slug, customer, and repos are immutable (400). Requires If-Match.
PUT /solutions/{id}/deploymentThe deploy write-back: replaces the whole deployment block. On state: "deployed" the service stamps deployedAt (if absent) and clears any previous error. Requires If-Match.

Attach/detach lives on the solution record — invariant 2’s single writable pointer. Aurora’s tenant-scoped attach routes are conveniences that forward here.

The solution record’s deployment block stores facts seeded by the deploy path, not values derived at read time: mode (dedicated or tenant-orbit), state (not-deployed / provisioning / deployed / failed / retired), the two public URLs (orbitUrl, missionControlUrl — since 2026-07-03 no agentUrl is written, the agent being internal-only, though legacy records may still carry the field), the Azure name hash (azure.hash, twelve hex chars), deployedAt, and the last error when failed.

What registry-kit factors out

implementation/shared/registry-kit (@maxq/registry-kit) holds the shared machinery, so each service’s own code is little more than its routes and document schemas:

ModuleProvides
store.tsDocumentStore — get/list/insert/replace/delete over <schema>.records with If-Match version guards; Filter/FilterField (SQL is only ever built from code constants, never from request strings); serializeRecord; typed ConflictError / VersionConflictError / NotFoundError.
pool.tscreatePool — the dual-mode pg pool: DATABASE_URL locally, Entra token auth on Azure (see below).
migrate.tsapplyMigrations (advisory-locked, {{schema}}-templated boot migrations) and ensureAudit (idempotent creation of audit.events).
audit.tswriteAudit — the append-only event writer.
middleware.tsserviceGuard (the dormant shared-secret guard), healthPayload (db-pinging health body), requireIfMatch, actorOf (the x-actor header), and storeErrorResponse (store error → { error } + status mapping).
peer.tsPeerClient — the fail-closed HTTP helper for cross-service invariant checks: returns 2xx/404 results, throws PeerUnavailableError on anything else (no config, network failure, timeout, 5xx), which routes turn into a 503 refusal.

Local-stack gotcha: registry-kit is a file: dependency baked into the local service images at build time. The compose bind-mount + tsx watch hot-reload only covers each service’s own codebase/ — after changing registry-kit you must rebuild the three service images (docker compose up -d --build of the three services).

Authentication and trust

Database access is Microsoft Entra-only on Azure — no Postgres password exists anywhere. The Flexible Server is created with password auth disabled; the platform’s user-assigned managed identity (id-aurora) and the signed-in operator are its Entra administrators. registry-kit’s createPool implements both worlds:

  • Local: DATABASE_URL — ordinary password auth against the compose Postgres container.
  • Azure: DATABASE_HOST + PG_USER (the UAMI name) — the pool’s password is an async callback that fetches a DefaultAzureCredential token for the ossrdbms-aad scope per new physical connection, so tokens refresh themselves and no restart is needed at expiry. The Azure SDK is lazily imported, so the local path carries no Azure dependency at runtime.

Service-to-service and BFF-to-service trust is primarily network isolation: internal-only ingress on Azure Container Apps, docker-network only locally (host ports 3021–3023 are published purely for curl during development). On top of that sits a dormant shared-secret guard (SERVICE_SHARED_SECRET, sent as the x-service-secret header): active only when the env var is set, always exempting /health — defense in depth, off by default. End-user authentication is a front-of-Aurora concern; the services never see it.

How Aurora consumes the registry

Aurora’s src/lib/registry/ is the only code that talks to the services (base URLs from CUSTOMER_SERVICE_URL / TENANT_SERVICE_URL / SOLUTION_SERVICE_URL; the whole feature is gated by AURORA_TENANTS_ENABLED). In brief — the full story belongs on the Aurora page:

  • The join (join.ts): registry record × solution.yaml, keyed by the customer-repo name (so a registry id may differ from a legacy yaml id — e.g. a cronos-wallonia-monwbi record joining a yaml that says id: monwbi). A record whose repo no longer appears in the GitHub org scan is flagged orphaned.
  • The reconciliation sweep (?sweep=true): compares the GitHub org scan with GET /solutions — repos without records are unregistered (one-click backfill via POST /solutions), records without repos are orphaned. Visible and healable, never auto-resolved.
  • BFF API routes: /api/customers, /api/tenants, /api/solutions (+ [id] variants). GET /api/tenants/{id} serves the tenant with its solutions resolved — the contract the tenant-scoped Orbit reader consumes (the “B2” endpoint). Attach/detach is exposed under /api/tenants/[id]/solutions/[solutionId]; the BFF’s attach additionally refuses a solution already attached elsewhere (409 naming the owner) and exposes deployment corrections via PATCH only.
  • Creation and deploy integration: with the registry enabled, createSolution requires a tenant (validated before forking), registers the record after seeding (a failure is a non-fatal registerError), and the deploy path writes the deployment record back via PUT /solutions/{id}/deployment (write-back.ts — per-solution Cloudflare URLs when a zone is configured, raw ACA URLs otherwise). ACA resources are tagged with customer/tenant.

Deployment

Local

The unified compose stack (infrastructure/local/docker-compose.yml) runs postgres:17-alpine (database portfolio, volume orbit-pg-data, health check via pg_isready) plus the three services on host ports 3021 / 3022 / 3023. Each service gets its DATABASE_URL, its PG_SCHEMA, and exactly the peer URLs it consults (docker-network names such as http://tenant-service:3000); services depend on the Postgres health check and migrate themselves on boot. aurora-webapp receives the three service URLs and AURORA_TENANTS_ENABLED.

Azure

Two deploy.sh verbs (in infrastructure/azure/), both GET-first and rerunnable:

  • deploy.sh postgresprovision-postgres.sh: an Azure Database for PostgreSQL Flexible Server (psql-orbit by default; Burstable Standard_B1ms, 32 GiB, pg 16), database portfolio, Entra-only auth (--microsoft-entra-auth Enabled --password-auth Disabled), created with --public-access 0.0.0.0 to seed the allow-Azure-services firewall rule (VNet integration is the documented escalation). Schemas and tables are not created here — the services’ boot migrations own that.
  • deploy.sh servicesprovision-services.sh: renders one template (app-service.yaml.tmpl) three times into the ACA apps svc-customer, svc-tenant, svc-solutioninternal ingress, port 3000, minReplicas: 0 (stateless, scale-to-zero-safe), 0.25 vCPU / 0.5 Gi, UAMI attached for both ACR pull and the Postgres token. Images come from the shared ACR via build-images.sh, which also writes the target: container release records under releases/<component>/.

Order on a fresh environment: bootstrappostgresbuild-images ×3 → servicesaurora.

Live-run gotchas (paid for during the 2026-07-03 production rollout):

  • AZURE_CLIENT_ID must be set on the service apps. DefaultAzureCredential will not pick a user-assigned identity without it — IMDS defaults to system-assigned and fails. The template injects the UAMI client id explicitly.
  • In-env service URLs use the app-name form (http://svc-customer). The https://svc-x.internal.env-domain FQDN form does not resolve on this environment — the BFF timed out until the switch. Both provisioners now render app-name URLs.
  • Several az CLI flags were renamed (--microsoft-entra-auth, not --active-directory-auth; microsoft-entra-admin, not ad-admin), and --public-access None now fully disables public network access. Fresh subscriptions need the Microsoft.DBforPostgreSQL resource provider registered (bootstrap.sh does it).
  • az acr build can transiently fail with a stale server-side upload-SAS error (“Signed expiry time … must be after signed start time”) — a plain retry succeeds.

Production state

The one-time portfolio migration (design §11) was executed against production on 2026-07-03: 7 customers, 7 default <customer>-main tenants, and 7 solution records all attached; deployment records were seeded for the two live stacks (fit-my-fit-portal and smals-sas-poc), and the final reconciliation sweep reported 0 unregistered / 0 orphaned. The registry is ON in production (aurora.tenants_enabled: true in the Azure config).