Single-replica writer with keep-alive-guarded scale-to-zero
Status: Accepted · Date: 2026-07-03 (scale-to-zero; the one-replica cap D4 predates it in the auto-deploy design) · Area: Deployment
Context
The per-solution agent is the sole writer over the solution repositories,
which live on Azure Files shares mounted over SMB. Two replicas operating on
one .git over SMB corrupt index.lock — so horizontal scaling is off the
table. At the same time, an always-on agent per solution is wasted spend for
stacks that are idle most of the time. The original plan for making
scale-to-zero safe was a work queue plus a KEDA scaler; Azure’s default HTTP
scale rule would otherwise evict the replica mid-task, since agent work
(planning, task drains) outlives any single HTTP request.
Decision
- The agent is capped at
maxReplicas: 1(D4). This is load-bearing well beyond git: the in-memory solution model and the event bus (ADR-007) assume exactly one writer process. - Since 2026-07-03 the agent deploys at
minReplicas: 0on both provisioning paths, with anAGENT_MIN_REPLICAS=1env override to force always-on. - Scale-to-zero is made safe by a busy keep-alive inside the agent, not
the queue + KEDA plan — that plan is explicitly superseded (design
§18.6).
src/util/keepAlive.tsholds a long-pollGET /keepalive?hold=45open against the agent’s own ingress (a newAGENT_SELF_URLenv injected by both provisioners — since the agent went internal-only on 2026-07-03 this is the in-env addresshttp://agent-<h>, no bypass header needed) while planning turns, the task drain loop, and startup recovery are in flight. The request still passes the (internal) ingress, so an in-flight ingress request pins ACA’s default HTTP scale rule above zero. - The drain loop’s hold ordering is deliberate:
kick()takes the hold and releases it only after the double-check re-kick, and a re-kick begins its own hold first — busy periods are covered gap-free. Chat turns stream inside an open request and need nothing. WithAGENT_SELF_URLunset (local dev), the keep-alive is dormant.
Consequences
- Idle agents cost nothing: the hold aborts, the app scales to zero after the ~5-minute cooldown, and any incoming request (task enqueue, Mission Control, the reader) wakes it.
- No queue infrastructure or KEDA scaler to operate; the mechanism lives in ~one utility plus three hold sites in the task pipeline.
- Accepted residual gap: a platform-initiated eviction mid-task is not self-waking — startup recovery only runs on the next incoming request.
- The holds and their release ordering are invariants: removing them or “simplifying” the ordering reintroduces mid-task eviction.
- Throughput stays bounded by one replica by design; the earlier queue + KEDA idea remains the shape a future multi-worker model would need, but it is not the current plan.
Evidence
implementation/maxq-orbit-agent/codebase/src/util/keepAlive.ts— the long-poll holdimplementation/maxq-orbit-agent/codebase/src/orchestrator/TaskProcessor.tsandTeamLead.ts— the hold sites (drain loop, recovery, planning turn)infrastructure/azure/provision-solution.sh/app-agent.yaml.tmpl—minReplicas: 0,AGENT_SELF_URLinjectionmemory/orbit-auto-deploy.md(D4 and the 2026-07-03 revision),memory/orbit-agent-task-processor.md(the keep-alive invariant),designs/orbit-auto-deploy.md§18.6