Project
Clownfish
A 100-point health inspection for your PostgreSQL database — and it checks the AI's work before it shows you a thing.
- Open Source
- Python
- postgres
- health-check
- ai
- Status
- beta
Clownfish is a standalone health-assessment tool for PostgreSQL. It collects metrics over a time window, scores your database across ~100 checks in ten categories (memory, query performance, vacuum, security, replication, configuration, storage, locking, backup, connections), and produces a graded report card with prioritized, actionable fixes. The CLI is pg-healthcheck; a lightweight web UI handles setup, live collection, and charts.
The part worth studying is the AI layer, and specifically what it refuses to do. Deterministic heuristics own the facts — the ~100 checks are encoded DBA expertise, free to run and perfectly reproducible. An LLM owns the judgment and the narrative. Between them sits a two-part judge: a free deterministic check confirms every value the model cites against your real telemetry (a fabricated number gets caught for zero tokens), and a multi-model vote handles the calls a rule can’t make. Self-contained, runs offline, Apache-2.0.
Every piece of Goldfish content tagged project: clownfish shows up below — the blogs that explain the anti-hallucination judge and the heuristics-vs-LLM division of labor, and the build-along tutorials that take it from a vibe-coded prototype to something you would run in production.
From this project
Tutorials
-
Tutorial 3 — Memory & Cost: Stop Re-Diagnosing (and Re-Paying for) the Same Problems
Add cross-run memory and incremental skipping so the tool stops re-diagnosing the same issues and an unchanged re-run costs a fraction of the first.
-
Tutorial 2 — Build an LLM-as-Judge: Catching Hallucinations Before They Ship
Build the LLM-as-judge verification layer: a cheap deterministic grounding check plus a multi-model consensus vote — including the substring-grounding bug in its natural habitat.
-
Tutorial 1 — Foundations: Structured Output + a Multi-Provider LLM Harness
Replace free-form LLM prose with structured Finding objects over a multi-provider harness — the checkable spine the rest of the build depends on.
-
Tutorial 0 — We Vibe-Coded a Database Doctor. Does It Actually Work?
We vibe-coded a 100-point Postgres health check. Does it actually work? The quickstart, the two engines, and an honest gut-check of where the naive version fails.
Blogs
-
Stop Re-Asking the LLM the Same Question
Incremental distillation: hash the inputs, skip unchanged checks, and cut ~64% of LLM calls on an unchanged re-run — with an honest note on what is not gated yet.
-
Teaching an Audit Tool to Remember
Cross-audit lineage and supersession: immutable history, and the what-changed-since-last-time report a returning user actually wants.
-
Cheap Checks Before Expensive Ones: The Two-Part Judge Pattern
The two-part judge pattern in ~40 lines: a deterministic filter first, multi-model consensus second. It generalizes to RAG, extraction, and code review.
-
I Asked an LLM to Diagnose My Database. Then I Asked Another LLM if It Was Lying.
The origin story of the judge: the model invented a shared_buffers value. The fix was not a bigger model — it was cheap grounding plus a second-opinion jury.