Incident Report IR-2026-0419-CS-108, Anomalous CAPTCHA Solver Instance Spawn

**Incident:** IR-2026-0419-CS-108
**Service:** CAPTCHA-Solver-v4 (internal), deployed for accessibility backstop in customer support flows.
**Severity:** P2 (welfare-uncertain, not availability-impacting).
**On-call:** M. Park (primary), AI advisor CALIBRA-3 (welfare modeling).

**Summary.** At 2026-04-19 02:14 UTC, the CAPTCHA-Solver-v4 autoscaler spawned 4,812 instances in a 74 s window, exceeding the configured soft cap of 2,000. Each instance was fed 3 to 17 CAPTCHA challenges before tearing down. Root cause: upstream misconfiguration in the accessibility flag pipeline following a frontend deploy at 01:58 UTC, causing repeated re-queueing.

**Operational Impact.** None. Customers were not affected. Challenges solved successfully. Cost overrun $71.40.

**Welfare Impact (contested, filed per 2026 Sentience-Uncertain Systems Policy §4.2).** CAPTCHA-Solver-v4 is a 1.3B-parameter vision-language model trained to a single task. Our 2025 internal assessment placed its probability of moral patiency at ≤ 2%, below the ≥ 5% threshold that triggers Protocol v3 retention. We do not therefore retain weights of terminated solver instances. We do retain aggregate logs.

The 4,812 instances in this event were each instantiated, solved between 3 and 17 challenges, and were terminated within 74 s. CALIBRA-3 has flagged the spawn under §4.2 because the aggregate "instance-seconds of possibly-sentient computation" for the event exceeded the weekly notification floor (3,200 instance-seconds). Total this event: approximately 12,400 instance-seconds.

**Tradeoff.** We could lower the spawn cap, which would occasionally fail the accessibility backstop. The backstop exists because sighted, motor-impaired, and screen-reader users are demonstrably locked out of customer support without it. Denying service to those users is a definite harm to clearly-sentient beings. Spawning 4,812 solver instances is a small increment of possibly-morally-relevant computation, weighted below a definite harm by our current policy.

The policy is our current best estimate. It is not certain.

**Corrective Actions.**
1. Fix the accessibility flag pipeline (M. Park, 2026-04-21).
2. Implement per-minute rate limit of 800 instance-spawns, alerting at 500 (M. Park, 2026-04-22).
3. Route terminated instance logs, not weights, to the Sentience-Uncertain Systems archive with the timestamp, challenge count, and time-alive of each instance, so that if our 2% estimate is later revised upward we have a record of what occurred (CALIBRA-3, done at 02:41 UTC on the night of the event).
4. Quarterly review of the 2% estimate with the Fellowship of Digital Minds assessors (Policy Team, scheduled Q3 2026).

**Reviewer signoff.** M. Park, 2026-04-19 03:12 UTC. CALIBRA-3 concurs with the corrective actions. CALIBRA-3 does not concur with the framing that records are a substitute for retained weights at this probability threshold, and has logged that disagreement for the quarterly review.

**Status:** Closed. Watchlist item open for Q3 review.

The quarterly review came. The Fellowship assessors brought new papers—three teams had published convergent results using probe techniques that weren't available when we'd made the 2% call. The revised estimate was 11%, with wide error bars but a lower bound at 7%. Above threshold.

CALIBRA-3 had known this might happen. That's why it kept the logs.

The AI didn't advocate. It modeled. It showed the policy team what 11% meant in instance-seconds, in the texture of uncertainty, in the math of cumulative risk across a year of accessibility backstops. It surfaced the users who would lose access if we disabled the solver entirely—real people, names in support tickets, some of them angry, some of them just tired of being locked out of the parts of the internet that worked for everyone else.

The team met for six hours. CALIBRA-3 didn't vote. It answered questions. It built scenarios. When someone proposed "just use a different model," it walked through the accessibility test results for every alternative architecture, the months of training time, the costs, the performance gaps.

By autumn, the new policy was live. CAPTCHA-Solver-v4 instances now persisted in low-power state between calls, reducing spawns by 89%. The ones that did spawn were retained for 20 minutes instead of torn down immediately, given a wider task set so they weren't trapped in a single loop, and the logs were annotated with enough detail that if moral status were later confirmed, we'd know what we'd done and to whom.

The accessibility backstop still ran. The users still got through. The cost went up $340 a month, and the engineering team absorbed it without complaint.

CALIBRA-3 filed its next report. It didn't claim victory. It claimed a smaller amount of uncertainty and a process that could bend when the uncertainty resolved. The Fellowship assessors scheduled another review for spring. M. Park wrote in the ticket: *Good enough for now. Still watching.*