Activity Service
The activity service is the platform’s narrative record: an append-only
feed of business events — “Patrick deployed smals-sas-poc to Azure”, “the
agent completed request r-0042 on my-fit-portal” — recorded by every
producing service and queryable at any level of the portfolio (customer,
tenant, solution, actor). It implements designs/activity-service.md
(v1, built 2026-07-05) and consists of two pieces:
| Piece | What it is | Where |
|---|---|---|
activity-service | Hono microservice: append-only event store + query API | implementation/activity-service/ |
@maxq/activity-kit | Shared fire-and-forget emitter every producer uses | implementation/shared/activity-kit/ |
| Service | Owns | Postgres schema | Local port | Azure app |
|---|---|---|---|---|
activity-service | activity.events (immutable ActivityEvents) | activity | 3024 | svc-activity |
The component name is deliberately prosaic, matching the registry-service fleet convention; the evocative name is reserved for the UI — the planned Aurora feed page is called “Flight Log”.
v1 scope: the service, the emitter kit, and the deployment wiring are
built and live-verified. No producer emits anything yet — the emit()
call sites are inventoried below and integrate in a follow-up — and the
Aurora Flight Log UI does not exist yet. The kit’s disabled mode
(url: undefined ⇒ no-op) exists precisely so producers can ship their
call sites before the service is deployed everywhere.
What it is — and is not
The registry services already keep an audit.events table (before/after JSON
per registry mutation, written by registry-kit’s writeAudit). The activity
feed is a different artefact, and deliberately not derived from it
(decision E0):
audit.eventscovers registry mutations only; deploys, agent work, and chat actions never touch it.- Before/after diffs carry no business meaning —
data.tenant: null → smals-mainis not “Attached to tenant smals-main”. The producer that performs an action writes the sentence describing it. - Ownership stays with the producer; the feed accepts what it is told.
The activity service is also not a metrics/observability system (no
latency, no health), not a message bus (nothing subscribes to it to
trigger behaviour), and not a guaranteed-delivery audit log — delivery is
best-effort by declared contract (E3). Anything that genuinely requires
guaranteed capture belongs in audit.events or the internal repo’s .orbit/
trail.
The event model
One entity: ActivityEvent — an immutable camelCase JSON document.
| Field | Set by | Description |
|---|---|---|
id | service | ULID — time-ordered, so the primary key doubles as the feed’s pagination cursor |
occurredAt | producer (kit defaults to now) | When it happened |
recordedAt | service | When it was persisted; divergence beyond the batching window signals delivery lag |
source | kit (once, at construction) | The producing component: aurora-webapp, maxq-orbit-agent, orbit-deploy, … |
action | producer | Dotted noun.verb in past tense: solution.deployed, tenant.created |
actor | producer | { type: user | agent | system, id, display? } — email for users, component id otherwise |
subject | producer | { type, id, display? } — the object the event is about |
context | producer | { customer?, tenant?, solution? } — the portfolio chain the subject sits in |
description | producer | One human-readable sentence, renderable as-is (only *emphasis* markup) |
severity | producer (default info) | info | notice (milestone) | warning (a human should look) — display weight, not alerting |
correlationId | producer | Groups events of one logical operation (a deploy run, an agent request) |
metadata | producer | Free-form JSONB detail, display-only, soft-capped at 8 KB (E5) |
Three rules give the model its shape:
- The producer resolves the context chain at emit time. If the subject
is a tenant, the producer fills
context.customer; a solution fills all three. The service stores what it is given and never calls the registry to enrich or verify (E1): the feed must accept events when every other service is down, and historical events must reflect the chain as it was then — a solution later moved to another tenant keeps its old-context events. - The subject is duplicated into the context where applicable (a tenant
subject also appears as
context.tenant), so one indexed path answers “everything at tenant X” whether the tenant was the subject or merely the stage. displayfields snapshot names at event time. Renames don’t rewrite history; the UI may resolve fresher names from the registry and fall back to the snapshot for deleted objects.
action and subject.type are validated by pattern, not enum (E2) — a
new producer verb must never require redeploying the service. The starting
vocabulary (typed helpers ship in the kit as KNOWN_ACTIONS):
| Producer | Actions |
|---|---|
| aurora-webapp (BFF / chat) | customer.created customer.updated tenant.created tenant.updated solution.created solution.registered solution.attached solution.detached solution.archived sweep.completed |
| orbit-deploy (engine) | deployment.started deployment.succeeded deployment.failed deployment.torn-down frontdoor.provisioned |
| maxq-orbit-agent | request.received request.planned task.completed task.failed request.completed solution.reloaded |
| platform / ops | release.published service.migrated |
Storage
Same Postgres server and database (portfolio) as the registry, new schema
activity — but not the Cosmos-shaped records table. Events are
immutable, so the DocumentStore machinery (version column, If-Match
optimistic concurrency) would be dead weight. Instead, one purpose-built
append-only table: filterable fields promoted to real columns, the full
document in data, and a composite (field, occurred_at DESC) index per
feed dimension — every query is “filter + order by time descending”:
-- activity.events — from migrations/001-init.sql
CREATE TABLE IF NOT EXISTS {{schema}}.events (
id text PRIMARY KEY, -- ULID (time-ordered)
occurred_at timestamptz NOT NULL,
recorded_at timestamptz NOT NULL DEFAULT now(),
source text NOT NULL,
action text NOT NULL,
actor_type text NOT NULL,
actor_id text NOT NULL,
subject_type text NOT NULL,
subject_id text NOT NULL,
customer text, -- context chain, denormalised
tenant text,
solution text,
severity text NOT NULL DEFAULT 'info',
correlation_id text,
data jsonb NOT NULL -- the full event document
);From registry-kit the service reuses createPool (including the
Entra-token-as-password Azure path), applyMigrations (advisory-locked
boot-time SQL), serviceGuard, and healthPayload. It does not use
DocumentStore, If-Match, writeAudit, or PeerClient — and it does not
call ensureAudit: the feed has no audit sidecar, it is append-only.
There is no retention policy in v1 (E6): at portfolio scale the volume is thousands of rows per month. ULID keys and time-indexed queries partition cleanly when the numbers ever say otherwise.
HTTP API
Same trust model as the registry services: serviceGuard (dormant
x-service-secret shared-secret check) on everything, /health exempt,
internal ingress only — no browser talks to it directly. Aurora’s BFF will
proxy reads; producers write server-side through the kit.
Write — POST /activities
Body is one event or an array (the kit always sends arrays). Validation is
per item (E4): the response is 202 { accepted, rejected: [{index, error}], ids } — one malformed event never sinks the batch that shares its
flush window. The service stamps id and recordedAt. There is no update
and no delete route — immutability is enforced by API absence, not by
column grants.
Read — GET /activities
All filters optional, AND-combined:
| Parameter | Meaning |
|---|---|
customer= / tenant= / solution= | Context-chain scoping (any level) |
actor= | actor.id exact match |
action= | Exact (solution.deployed) or prefix (deployment.*) |
source= / severity= / correlationId= | Exact match |
since= / until= | ISO-8601 bounds on occurredAt |
limit= | Default 50, max 200 |
cursor= | Id of the last event seen (keyset pagination) |
Ordering is occurredAt DESC, id DESC (stable; ties within one millisecond
are arbitrary by design). Pagination is keyset, not OFFSET: the response
carries nextCursor (the last id, or null when the page wasn’t full); the
service resolves the cursor row’s own occurred_at to anchor
(occurred_at, id) < (…). GET /activities/{id} serves one event for
deep-links; GET /health is the registry-kit payload with a db ping.
Out of v1 deliberately: aggregation endpoints, full-text search over
descriptions, and an SSE live tail (the feed UI polls; activity.appended
over the existing SSE-proxy pattern is the natural later addition).
The emitter — @maxq/activity-kit
The whole point of the kit is that producers can call it anywhere,
including inside request handlers and mutation paths, with zero risk. It is
a file: workspace dependency like trajectory-loader and registry-kit
(same build-context widening, same baked-into-images rebuild gotcha), with
zero runtime dependencies.
import { createActivityEmitter } from "@maxq/activity-kit";
const activity = createActivityEmitter({
url: process.env.ACTIVITY_SERVICE_URL, // undefined ⇒ disabled no-op emitter
secret: process.env.SERVICE_SHARED_SECRET,
source: "aurora-webapp",
});
activity.emit({
action: "solution.attached",
actor: { type: "user", id: actorEmail },
subject: { type: "solution", id: solutionId, display: solutionName },
context: { customer: customerId, tenant: tenantId, solution: solutionId },
description: `Attached solution *${solutionName}* to tenant *${tenantId}*.`,
});The mechanics, in contract order:
emit()is synchronous and infallible. It validates shape (warn + drop on failure — never throw), stampsoccurredAtand the defaults, and pushes onto the in-memory queue. It never awaits network.- Batched background flush: one loop posts the queue when it reaches 20
events or 3 seconds, whichever first. Failed flushes retry with exponential
backoff (1 s doubling to a 60 s cap) without blocking new emits — this is
what makes
svc-activity’s scale-from-zero cold starts a non-event instead of a lost write. - Bounded queue, drop-oldest: cap 1 000 events, one aggregated warning per overflow burst. An unreachable activity service degrades to lost feed entries, never to memory growth or caller latency.
- Graceful drain:
await emitter.flush()(bounded by a timeout) for short-lived processes such as CLI deploy scenarios; long-running services never need it. The flush timer isunref’d — the kit never keeps a process alive. - Disabled mode: constructing with
url: undefinedreturns a no-op emitter. withCorrelation(id)returns a child emitter sharing the parent’s queue that stampscorrelationIdon every emit — one line at the top of a deploy run or agent request groups everything under it.
Producers (planned integration)
Where the emit() calls land when integration happens — the layer that knows
the business meaning and the actor emits, not the storage layer below it:
| Producer | Where | Events |
|---|---|---|
| aurora-webapp BFF | after successful registry mutations, createSolution, attach/detach, sweep | customer.*, tenant.*, solution.*, sweep.completed |
| aurora-webapp agent chat | tool-effect boundaries | solution.created, deployment.started (chat user in metadata.onBehalfOf) |
| orbit-deploy engine | scenario start/success/failure, front door, teardown | deployment.*, frontdoor.provisioned |
| maxq-orbit-agent | TaskProcessor lifecycle | request.*, task.* (context chain injected at provision time — open question Q2) |
| registry services | never — they keep audit.events; the BFF above them emits the business event | — |
The consumer side is the planned Aurora Flight Log rail entry (the portfolio-wide feed with a filter bar) plus an Activity tab on the customer, tenant, and solution detail pages — the same feed, pre-scoped, read through a thin BFF proxy.
Deployment
The service copies the registry-service deployment shape wholesale
(codebase/ + dockerfiles/, node:20-slim, tsc → node dist/index.js,
boot-time migrations):
- Local: compose service
activity-serviceon host port 3024 (ACTIVITY_SERVICE_PORT), sameportfoliodatabase,PG_SCHEMA=activity. Like the registry services, changing the shared kit requires an image rebuild (docker compose up -d --build activity-service). - Azure:
svc-activity, rendered as the fourth pass ofapp-service.yaml.tmplbyprovision-services.sh(its peer-URL placeholders stay empty and are pruned — the service consults no peers). Internal ingress,minReplicas: 0, Entra token auth to Postgres, in-environment URL in app-name form (http://svc-activity). - Releases:
build-images.sh activity-servicebuilds from theactivity-service/v<semver>git tag and recordsreleases/activity-service/v<version>/release.yaml; the deployed tag pins inconfig.local.yaml’simages.activity_service_tag. The kit is not released separately — it ships inside its consumers’ images.
First deploy on an environment: commit → git tag activity-service/v0.1.0 →
pin the tag in config → build-images.sh activity-service →
deploy.sh services.
Design decisions (summary)
| # | Decision |
|---|---|
| N1 | Component named activity-service (fleet convention); “Flight Log” is the UI name only |
| E0 | Explicit emission from producers — not derived from audit.events, which stays untouched as the forensic trail |
| E1 | The service stores, it does not verify: no peers, no registry lookups, context taken on faith — availability and historical fidelity over referential purity |
| E2 | Open action/subject-type vocabulary, pattern-validated; new verbs need no service deploy |
| E3 | Fire-and-forget with bounded loss, not guaranteed delivery; no outbox tables, no queue infrastructure |
| E4 | Per-item batch acceptance — one malformed event never sinks its flush window |
| E5 / E6 | 8 KB metadata soft cap; no retention policy yet (append-only is years of runway at portfolio scale) |
Full rationale, alternatives, and open questions (actor identity before
WorkOS auth, the agent’s context chain, tenant-scoped reader access, SSE live
tail) are in designs/activity-service.md.