Single-replica writer with keep-alive-guarded scale-to-zero

Status: Accepted · Date: 2026-07-03 (scale-to-zero; the one-replica cap D4 predates it in the auto-deploy design) · Area: Deployment

Context

The per-solution agent is the sole writer over the solution repositories, which live on Azure Files shares mounted over SMB. Two replicas operating on one .git over SMB corrupt index.lock — so horizontal scaling is off the table. At the same time, an always-on agent per solution is wasted spend for stacks that are idle most of the time. The original plan for making scale-to-zero safe was a work queue plus a KEDA scaler; Azure’s default HTTP scale rule would otherwise evict the replica mid-task, since agent work (planning, task drains) outlives any single HTTP request.

Decision

The agent is capped at maxReplicas: 1 (D4). This is load-bearing well beyond git: the in-memory solution model and the event bus (ADR-007) assume exactly one writer process.
Since 2026-07-03 the agent deploys at minReplicas: 0 on both provisioning paths, with an AGENT_MIN_REPLICAS=1 env override to force always-on.
Scale-to-zero is made safe by a busy keep-alive inside the agent, not the queue + KEDA plan — that plan is explicitly superseded (design §18.6). src/util/keepAlive.ts holds a long-poll GET /keepalive?hold=45 open against the agent’s own ingress (a new AGENT_SELF_URL env injected by both provisioners — since the agent went internal-only on 2026-07-03 this is the in-env address http://agent-<h>, no bypass header needed) while planning turns, the task drain loop, and startup recovery are in flight. The request still passes the (internal) ingress, so an in-flight ingress request pins ACA’s default HTTP scale rule above zero.
The drain loop’s hold ordering is deliberate: kick() takes the hold and releases it only after the double-check re-kick, and a re-kick begins its own hold first — busy periods are covered gap-free. Chat turns stream inside an open request and need nothing. With AGENT_SELF_URL unset (local dev), the keep-alive is dormant.

Consequences

Idle agents cost nothing: the hold aborts, the app scales to zero after the ~5-minute cooldown, and any incoming request (task enqueue, Mission Control, the reader) wakes it.
No queue infrastructure or KEDA scaler to operate; the mechanism lives in ~one utility plus three hold sites in the task pipeline.
Accepted residual gap: a platform-initiated eviction mid-task is not self-waking — startup recovery only runs on the next incoming request.
The holds and their release ordering are invariants: removing them or “simplifying” the ordering reintroduces mid-task eviction.
Throughput stays bounded by one replica by design; the earlier queue + KEDA idea remains the shape a future multi-worker model would need, but it is not the current plan.

Evidence

implementation/maxq-orbit-agent/codebase/src/util/keepAlive.ts — the long-poll hold
implementation/maxq-orbit-agent/codebase/src/orchestrator/TaskProcessor.ts and TeamLead.ts — the hold sites (drain loop, recovery, planning turn)
infrastructure/azure/provision-solution.sh / app-agent.yaml.tmpl — minReplicas: 0, AGENT_SELF_URL injection
memory/orbit-auto-deploy.md (D4 and the 2026-07-03 revision), memory/orbit-agent-task-processor.md (the keep-alive invariant), designs/orbit-auto-deploy.md §18.6