What We Shipped in May 2026

We spent today closing the loop on a race condition that was silently killing the first turn of voice conversations. PR #436 introduces a fallback where ClaudeCodeBridgeLLMService catches a specific Session ID <uuid> is already in use error during the very first user interaction and retries the spawn with --resume flags. We debated the approach between retrying versus deferring. Deferring UUID generation would have only solved the 2026-05-08 specific scenario. By detecting the already in use stderr on the first turn and transparently resuming against the existing JSONL on disk, we cover a broader class of races–preflights, healthchecks, and restart loops. The implementation is guarded by self._first_turn so we only attempt this one-shot retry once, ensuring a real downstream regression on the resume path still raises a proper error. It’s a handful of lines, one --resume flag, and a WARNING log entry in Loki to keep the recovery path visible. While the voice agent stabilizes, we expanded the test net in test_topic_queue_cap.py from 5 to 18 tests, directly exercising the helpers in services/topic_proposal_service.py. This means future refactors of the queue logic won’t silently break our contract for pending_topic_count or resolve_max_pending. The broader ecosystem needed attention too. We patched backup-visibility bind mounts and addressed schema/dependency bugs surfaced by the post-audit health check, ensuring our infra remains robust enough to support the agent’s edge cases. From here, the voice agent handles the collision recovery gracefully, so we can focus on what comes next. We are still not in love with the QA threshold tuning, but at least we have data now.