Activity Service

The activity service is the platform’s narrative record: an append-only feed of business events — “Patrick deployed smals-sas-poc to Azure”, “the agent completed request r-0042 on my-fit-portal” — recorded by every producing service and queryable at any level of the portfolio (customer, tenant, solution, actor). It implements designs/activity-service.md (v1, built 2026-07-05) and consists of two pieces:

Piece	What it is	Where
`activity-service`	Hono microservice: append-only event store + query API	`implementation/activity-service/`
`@maxq/activity-kit`	Shared fire-and-forget emitter every producer uses	`implementation/shared/activity-kit/`

Service	Owns	Postgres schema	Local port	Azure app
`activity-service`	`activity.events` (immutable ActivityEvents)	`activity`	3024	`svc-activity`

The component name is deliberately prosaic, matching the registry-service fleet convention; the evocative name is reserved for the UI — the planned Aurora feed page is called “Flight Log”.

v1 scope: the service, the emitter kit, and the deployment wiring are built and live-verified. No producer emits anything yet — the emit() call sites are inventoried below and integrate in a follow-up — and the Aurora Flight Log UI does not exist yet. The kit’s disabled mode (url: undefined ⇒ no-op) exists precisely so producers can ship their call sites before the service is deployed everywhere.

What it is — and is not

The registry services already keep an audit.events table (before/after JSON per registry mutation, written by registry-kit’s writeAudit). The activity feed is a different artefact, and deliberately not derived from it (decision E0):

audit.events covers registry mutations only; deploys, agent work, and chat actions never touch it.
Before/after diffs carry no business meaning — data.tenant: null → smals-main is not “Attached to tenant smals-main”. The producer that performs an action writes the sentence describing it.
Ownership stays with the producer; the feed accepts what it is told.

The activity service is also not a metrics/observability system (no latency, no health), not a message bus (nothing subscribes to it to trigger behaviour), and not a guaranteed-delivery audit log — delivery is best-effort by declared contract (E3). Anything that genuinely requires guaranteed capture belongs in audit.events or the internal repo’s .orbit/ trail.

The event model

One entity: ActivityEvent — an immutable camelCase JSON document.

Field	Set by	Description
`id`	service	ULID — time-ordered, so the primary key doubles as the feed’s pagination cursor
`occurredAt`	producer (kit defaults to now)	When it happened
`recordedAt`	service	When it was persisted; divergence beyond the batching window signals delivery lag
`source`	kit (once, at construction)	The producing component: `aurora-webapp`, `maxq-orbit-agent`, `orbit-deploy`, …
`action`	producer	Dotted `noun.verb` in past tense: `solution.deployed`, `tenant.created`
`actor`	producer	`{ type: user \| agent \| system, id, display? }` — email for users, component id otherwise
`subject`	producer	`{ type, id, display? }` — the object the event is about
`context`	producer	`{ customer?, tenant?, solution? }` — the portfolio chain the subject sits in
`description`	producer	One human-readable sentence, renderable as-is (only `emphasis` markup)
`severity`	producer (default `info`)	`info` \| `notice` (milestone) \| `warning` (a human should look) — display weight, not alerting
`correlationId`	producer	Groups events of one logical operation (a deploy run, an agent request)
`metadata`	producer	Free-form JSONB detail, display-only, soft-capped at 8 KB (E5)

Three rules give the model its shape:

The producer resolves the context chain at emit time. If the subject is a tenant, the producer fills context.customer; a solution fills all three. The service stores what it is given and never calls the registry to enrich or verify (E1): the feed must accept events when every other service is down, and historical events must reflect the chain as it was then — a solution later moved to another tenant keeps its old-context events.
The subject is duplicated into the context where applicable (a tenant subject also appears as context.tenant), so one indexed path answers “everything at tenant X” whether the tenant was the subject or merely the stage.
display fields snapshot names at event time. Renames don’t rewrite history; the UI may resolve fresher names from the registry and fall back to the snapshot for deleted objects.

action and subject.type are validated by pattern, not enum (E2) — a new producer verb must never require redeploying the service. The starting vocabulary (typed helpers ship in the kit as KNOWN_ACTIONS):

Producer	Actions
aurora-webapp (BFF / chat)	`customer.created` `customer.updated` `tenant.created` `tenant.updated` `solution.created` `solution.registered` `solution.attached` `solution.detached` `solution.archived` `sweep.completed`
orbit-deploy (engine)	`deployment.started` `deployment.succeeded` `deployment.failed` `deployment.torn-down` `frontdoor.provisioned`
maxq-orbit-agent	`request.received` `request.planned` `task.completed` `task.failed` `request.completed` `solution.reloaded`
platform / ops	`release.published` `service.migrated`

Storage

Same Postgres server and database (portfolio) as the registry, new schema activity — but not the Cosmos-shaped records table. Events are immutable, so the DocumentStore machinery (version column, If-Match optimistic concurrency) would be dead weight. Instead, one purpose-built append-only table: filterable fields promoted to real columns, the full document in data, and a composite (field, occurred_at DESC) index per feed dimension — every query is “filter + order by time descending”:


-- activity.events — from migrations/001-init.sql
CREATE TABLE IF NOT EXISTS {{schema}}.events (
  id             text PRIMARY KEY,              -- ULID (time-ordered)
  occurred_at    timestamptz NOT NULL,
  recorded_at    timestamptz NOT NULL DEFAULT now(),
  source         text NOT NULL,
  action         text NOT NULL,
  actor_type     text NOT NULL,
  actor_id       text NOT NULL,
  subject_type   text NOT NULL,
  subject_id     text NOT NULL,
  customer       text,                          -- context chain, denormalised
  tenant         text,
  solution       text,
  severity       text NOT NULL DEFAULT 'info',
  correlation_id text,
  data           jsonb NOT NULL                 -- the full event document
);

From registry-kit the service reuses createPool (including the Entra-token-as-password Azure path), applyMigrations (advisory-locked boot-time SQL), serviceGuard, and healthPayload. It does not use DocumentStore, If-Match, writeAudit, or PeerClient — and it does not call ensureAudit: the feed has no audit sidecar, it is append-only.

There is no retention policy in v1 (E6): at portfolio scale the volume is thousands of rows per month. ULID keys and time-indexed queries partition cleanly when the numbers ever say otherwise.

HTTP API

Same trust model as the registry services: serviceGuard (dormant x-service-secret shared-secret check) on everything, /health exempt, internal ingress only — no browser talks to it directly. Aurora’s BFF will proxy reads; producers write server-side through the kit.

Write — `POST /activities`

Body is one event or an array (the kit always sends arrays). Validation is per item (E4): the response is 202 { accepted, rejected: [{index, error}], ids } — one malformed event never sinks the batch that shares its flush window. The service stamps id and recordedAt. There is no update and no delete route — immutability is enforced by API absence, not by column grants.

Read — `GET /activities`

All filters optional, AND-combined:

Parameter	Meaning
`customer=` / `tenant=` / `solution=`	Context-chain scoping (any level)
`actor=`	`actor.id` exact match
`action=`	Exact (`solution.deployed`) or prefix (`deployment.*`)
`source=` / `severity=` / `correlationId=`	Exact match
`since=` / `until=`	ISO-8601 bounds on `occurredAt`
`limit=`	Default 50, max 200
`cursor=`	Id of the last event seen (keyset pagination)

Ordering is occurredAt DESC, id DESC (stable; ties within one millisecond are arbitrary by design). Pagination is keyset, not OFFSET: the response carries nextCursor (the last id, or null when the page wasn’t full); the service resolves the cursor row’s own occurred_at to anchor (occurred_at, id) < (…). GET /activities/{id} serves one event for deep-links; GET /health is the registry-kit payload with a db ping.

Out of v1 deliberately: aggregation endpoints, full-text search over descriptions, and an SSE live tail (the feed UI polls; activity.appended over the existing SSE-proxy pattern is the natural later addition).

The emitter — `@maxq/activity-kit`

The whole point of the kit is that producers can call it anywhere, including inside request handlers and mutation paths, with zero risk. It is a file: workspace dependency like trajectory-loader and registry-kit (same build-context widening, same baked-into-images rebuild gotcha), with zero runtime dependencies.


import { createActivityEmitter } from "@maxq/activity-kit";
 
const activity = createActivityEmitter({
  url: process.env.ACTIVITY_SERVICE_URL,   // undefined ⇒ disabled no-op emitter
  secret: process.env.SERVICE_SHARED_SECRET,
  source: "aurora-webapp",
});
 
activity.emit({
  action: "solution.attached",
  actor: { type: "user", id: actorEmail },
  subject: { type: "solution", id: solutionId, display: solutionName },
  context: { customer: customerId, tenant: tenantId, solution: solutionId },
  description: `Attached solution *${solutionName}* to tenant *${tenantId}*.`,
});

The mechanics, in contract order:

emit() is synchronous and infallible. It validates shape (warn + drop on failure — never throw), stamps occurredAt and the defaults, and pushes onto the in-memory queue. It never awaits network.
Batched background flush: one loop posts the queue when it reaches 20 events or 3 seconds, whichever first. Failed flushes retry with exponential backoff (1 s doubling to a 60 s cap) without blocking new emits — this is what makes svc-activity’s scale-from-zero cold starts a non-event instead of a lost write.
Bounded queue, drop-oldest: cap 1 000 events, one aggregated warning per overflow burst. An unreachable activity service degrades to lost feed entries, never to memory growth or caller latency.
Graceful drain: await emitter.flush() (bounded by a timeout) for short-lived processes such as CLI deploy scenarios; long-running services never need it. The flush timer is unref’d — the kit never keeps a process alive.
Disabled mode: constructing with url: undefined returns a no-op emitter.
withCorrelation(id) returns a child emitter sharing the parent’s queue that stamps correlationId on every emit — one line at the top of a deploy run or agent request groups everything under it.

Producers (planned integration)

Where the emit() calls land when integration happens — the layer that knows the business meaning and the actor emits, not the storage layer below it:

Producer	Where	Events
aurora-webapp BFF	after successful registry mutations, `createSolution`, attach/detach, sweep	`customer.`, `tenant.`, `solution.*`, `sweep.completed`
aurora-webapp agent chat	tool-effect boundaries	`solution.created`, `deployment.started` (chat user in `metadata.onBehalfOf`)
orbit-deploy engine	scenario start/success/failure, front door, teardown	`deployment.*`, `frontdoor.provisioned`
maxq-orbit-agent	TaskProcessor lifecycle	`request.`, `task.` (context chain injected at provision time — open question Q2)
registry services	never — they keep `audit.events`; the BFF above them emits the business event	—

The consumer side is the planned Aurora Flight Log rail entry (the portfolio-wide feed with a filter bar) plus an Activity tab on the customer, tenant, and solution detail pages — the same feed, pre-scoped, read through a thin BFF proxy.

Deployment

The service copies the registry-service deployment shape wholesale (codebase/ + dockerfiles/, node:20-slim, tsc → node dist/index.js, boot-time migrations):

Local: compose service activity-service on host port 3024 (ACTIVITY_SERVICE_PORT), same portfolio database, PG_SCHEMA=activity. Like the registry services, changing the shared kit requires an image rebuild (docker compose up -d --build activity-service).
Azure: svc-activity, rendered as the fourth pass of app-service.yaml.tmpl by provision-services.sh (its peer-URL placeholders stay empty and are pruned — the service consults no peers). Internal ingress, minReplicas: 0, Entra token auth to Postgres, in-environment URL in app-name form (http://svc-activity).
Releases: build-images.sh activity-service builds from the activity-service/v<semver> git tag and records releases/activity-service/v<version>/release.yaml; the deployed tag pins in config.local.yaml’s images.activity_service_tag. The kit is not released separately — it ships inside its consumers’ images.

First deploy on an environment: commit → git tag activity-service/v0.1.0 → pin the tag in config → build-images.sh activity-service → deploy.sh services.

Design decisions (summary)

#	Decision
N1	Component named `activity-service` (fleet convention); “Flight Log” is the UI name only
E0	Explicit emission from producers — not derived from `audit.events`, which stays untouched as the forensic trail
E1	The service stores, it does not verify: no peers, no registry lookups, context taken on faith — availability and historical fidelity over referential purity
E2	Open action/subject-type vocabulary, pattern-validated; new verbs need no service deploy
E3	Fire-and-forget with bounded loss, not guaranteed delivery; no outbox tables, no queue infrastructure
E4	Per-item batch acceptance — one malformed event never sinks its flush window
E5 / E6	8 KB metadata soft cap; no retention policy yet (append-only is years of runway at portfolio scale)

Full rationale, alternatives, and open questions (actor identity before WorkOS auth, the agent’s context chain, tenant-scoped reader access, SSE live tail) are in designs/activity-service.md.