Fixing the GPU lock and taming the internal RAG sweep

What we shipped on 2026-06-20

The 2026-06-19 pipeline validation exposed a few ghosts in our machine, and today was about exorcising them. The biggest fight was with our hardware arbitration; we found that the media render was bypassing the scheduler entirely (PR #1766). Because it never acquired gpu.lock("video"), the resident 18 GB writer/director model stayed pinned in VRAM while Wan tried to load, leading to a repeated “inference server unreachable” failure. We wrapped the render in the lock and fixed a quirk where the scheduler was mislabeling our own sibling processes as “gaming” activity.

We also had to wrestle with the discovery engine. In one batch, 4 out of 5 candidates were system-introspection topics–essentially Poindexter talking to itself about “Branch Drift in AI Content Pipelines” instead of finding actual consumer content (PR #1771). The root cause was a lack of source-diversity; if internal_rag produced high-scoring candidates, they simply swept the batch. We implemented _apply_source_diversity_cap to enforce a soft cap (default 0.5), ensuring internal meta-talk can’t dominate more than half the batch.

On the content side, we caught some “lazy” writing from Gemma. The writer was dropping bare parenthetical placeholders like (source) or (cite) that bypassed our existing markdown-href rails (PR #1769). We added a deterministic catch in modules/content/content_validator.py using anchored patterns to hard-reject these as critical errors.

We spent some time cleaning up wasteful cycles and silent failures: - The podcast watchdog was triggering full TTS re-renders instead of reusing Stage-3 assets (PR #1770). - A hardcoded 120s timeout in ReviewVideoShotListStage was causing the director’s self-critique to time out mid-list, leaving videos unreviewed (PR #1763). - Our QA rails were accidentally loading a separate 23 GB judge for deepeval_rails, which starved VRAM and locked up the desktop (PR #1762). We’ve shifted that advisory judge to the budget tier to keep it on the resident writer model.

To keep the system observable, we fixed a bug where pipeline_tasks.stage would freeze during atom-node execution (PR #1768). By having _wrap_atom call _mark_stage_column, our Grafana panels now actually track live atoms like content.generate_draft instead of getting stuck at verify_task.

Finally, we’re starting to professionalize the API. We landed a response-contract ADR (PR #1767) to lock in snake_case and standardize list envelopes ({items, total, limit, offset}) across our 107 operations. This stops the “untyped escape hatch” of dict[str, Any] from spreading further into the codebase.

We’ve stabilized the VRAM contention and tightened the content rails. Now we can actually trust the pipeline to run end-to-end without a human babysitting the GPU.

Auto-compiled by Poindexter from today’s commits and PRs. See the work: github.com/Glad-Labs/poindexter.

Sources

https://github.com/Glad-Labs/poindexter

What we shipped on 2026-06-20

We’ve stabilized the VRAM contention and tightened the content rails. Now we can actually trust the pipeline to run end-to-end without a human babysitting the GPU.

Auto-compiled by Poindexter from today’s commits and PRs. See the work: github.com/Glad-Labs/poindexter.

Sources

https://github.com/Glad-Labs/poindexter

Fixing the GPU lock and taming the internal RAG sweep

Sources

More from Glad Labs

The VRAM Currency Problem

The Shift from Native to Upscaled

Why Frame Time Matters More Than FPS for Smooth Gaming

Discussion

Fixing the GPU lock and taming the internal RAG sweep

Sources

More from Glad Labs

The VRAM Currency Problem

The Shift from Native to Upscaled

Why Frame Time Matters More Than FPS for Smooth Gaming

Discussion