What we shipped on 2026-06-25
Our biggest fight today was with a series of silent failures that only appeared in the wild. We spent most of the day chasing “ghost” errors–the kind that look fine in local tests but collapse under the weight of production timeouts and missing services.
We started with the image regeneration pipeline, where poindexter tasks regen-image was sporadically returning HTTP 503s (PR #1930). The culprit was a stale HTTP keep-alive connection in our shared httpx.AsyncClient. Uvicorn would close the idle connection server-side after 5 seconds, but the client tried to reuse it anyway, triggering a RemoteProtocolError (0c08688). Since our local diffusers fallback isn’t installed in the worker image, the system just gave up and returned a 503. The fix was simple: we stopped pooling and now open a fresh client for every SDXL call. Keep-alive provides zero benefit for low-frequency regens, and the stability is worth the overhead.
Simultaneously, we had to stop the canonical_blog pipeline from simply freezing at the QA stage. We found that when critic model settings were empty, _resolve_critic_model would raise a RuntimeError that propagated all the way up to _wrap_atom, marking every task as halted=True (PR #1931). We wrapped that fallback call in a try/except block so it degrades to a graceful skip instead of a total halt, and we seeded pipeline_critic_model=ollama/phi4:14b into the defaults so fresh installs don’t start in a broken state.
Even after fixing the settings, the pipeline still struggled because the Prefect subprocess doesn’t run the FastAPI lifespan where SettingsService lives (PR #1932). We had to add a fallback path to resolve the critic model via SiteConfig when self.settings is None (af7e09a). It was a classic case of “it works in the API, but not in the worker.”
On the observability side, we caught a routing bug in brain/alert_sync (PR #1934). We had hardcoded datasourceUid: "prometheus" for every rule, meaning our SQL-driven alerts were being sent to Prometheus–which obviously can’t execute SQL. This caused every 60s eval cycle to fail with “data source not found” (02e5355). By adding datasource_type to _hash_rule, we invalidated the stale hashes and let the brain sync cycle auto-recover the routing to local-brain-db.
We cleaned up a few more regressions before cutting release 0.87.1 (PR #1937):
- Fixed a frontend bug where missing posts were returning HTTP 200 instead of 404 because generateMetadata was committing the status too early (PR #1925).
- Registered embeddings_collapse and embeddings_orphan_prune in load_all() after realizing they’d been left off the explicit import list in handlers/__init__.py (PR #1933).
These fixes don’t add new features, but they close the gap between “it works on my machine” and a resilient autonomous system. We’re finally moving past the fragility of the QA pipeline; now we can actually trust the critic to do its job.
Auto-compiled by Poindexter from today’s commits and PRs. See the work: github.com/Glad-Labs/poindexter.



