Index / Notes / Definition

How Diagest Powers CLV.gg's Three-Year Historical Betting Database

A live feed tells you what a price is right now. Three years of clean history tells you whether that price is actually an edge. The hard part is the word clean.

Reid Spachman 3 min read
TL;DR
  • CLV.gg's real-time detection is anchored by a historical database spanning three-plus years of betting prices: opening lines, every move, closing prices, and settled results across many books and markets.
  • At sub-minute granularity across that many books, the raw history runs to billions of price points, and a raw feed at that scale is unusable without heavy processing.
  • Diagest, the ixprt data-for-AI pipeline, is what makes it usable: ingest, clean, deduplicate, time-align, and attach provenance, then serve it in a form models and queries can actually read.
  • The cleaned history feeds an in-house model that reasons over how these markets have behaved over years, not just the current tick.
  • What the database enables: backtesting strategies against real prices, grading closing line value against real closes, and calibrating the live consensus so today's fair-price estimate is grounded in history.

A live price tells you what a market thinks right now. It does not tell you whether that price is an edge.

To know that, you need history: what this market has done before, how its lines move from open to close, how often a price like the one in front of you held up. CLV.gg, the live betting-intelligence platform we build at ixprt, runs its real-time detection on top of exactly that, a historical database spanning more than three years of betting prices. The live feed finds the candidate. The history is what tells you whether the candidate is real.

The catch is in one word. The history is only useful if it is clean, and at this scale clean is an enormous amount of work. That work is what Diagest does.

What the history actually is

The database is not a table of final scores. It is the full life of every market we track, recorded as it happened.

For each market that means the opening line, every intermediate move as books adjusted, the closing price, and the settled result, across many books and many markets, over three-plus years. Recorded at sub-minute granularity, that is not thousands of rows or millions. It runs to billions of individual price points. Every one of them is a small fact: at this instant, this book offered this price on this outcome. Stacked together over years, they describe how these markets actually behave, which is a far stronger basis for judging a live price than any single snapshot.

That same volume is what makes the raw data unusable as it arrives.

Why raw history is unusable

A pile of billions of price points sounds like an asset. In raw form it is closer to a liability.

The data arrives from many sources, each with its own format, its own identifiers for the same team or market, its own clock. Feeds drop and reconnect, leaving gaps. The same price tick arrives twice and has to be recognized as one event, not two. One source stamps a market one way and another stamps it differently, and unless they are reconciled, a query counts one market as two. At a few thousand records you can eyeball these problems. At billions, they compound into a dataset that quietly lies to anyone who trusts it.

So before the history can answer a single question, it has to be made trustworthy. That is an ingest-and-cleaning problem, and it is precisely the problem Diagest exists to solve.

What Diagest does

Diagest is the ixprt data-for-AI pipeline. Applied to the betting history, it does the unglamorous work that makes everything downstream possible.

It ingests the raw price feeds from every source. It cleans the noise and the malformed records. It deduplicates, so a price tick that arrived three times counts once. It aligns everything onto a single consistent timeline, so a move on one book and a move on another can be compared at the same instant. It reconciles identifiers, so one market is one market everywhere. And it attaches provenance, so every record carries where it came from and when. The output is written in a form a model or a query can read directly, instead of a tangle that needs re-parsing every time someone asks a question.

The same pipeline, the same discipline, ixprt runs across every data source it touches. The betting history is one corpus among several; what is specific here is the scale and the speed the downstream product demands.

What the clean history enables

Once the history is trustworthy, it stops being storage and starts working for the live product. Three uses matter most.

The first is backtesting. A strategy is only worth running if it would have worked against the prices that were actually available at the time, not against a smoothed or hindsight-adjusted version. A clean, time-aligned record lets a strategy be tested against the real market as it stood.

The second is grading closing line value. CLV is measured against where a market closed, so grading it honestly requires the real historical closes, devigged and aligned. The database is what makes that grade trustworthy rather than approximate.

The third is calibration. CLV.gg's live true-price estimate is sharper when it is informed by how these markets have actually moved over years. The cleaned history feeds an in-house model that reasons over that record, so the current read on a market is grounded in its real behavior instead of a single live snapshot.

The shape of the whole thing

Put together, the two systems are a clean division of labor. Diagest turns three years of messy, billions-strong price history into a record you can trust. CLV.gg uses that record to ground its live, sub-second reads on the market. One makes the data honest; the other turns it into a decision.

It is the same loop ixprt runs everywhere: clean the data, build structure on top, ship a product that depends on it. If you are sitting on a large, messy, valuable dataset and burning weeks making it usable before it can do anything, that gap between raw and usable is exactly what Diagest is built to close.

Frequently asked

What is the historical database behind CLV.gg?

It is three-plus years of betting market history: opening lines, every intermediate price move, closing prices, and settled outcomes across many books and markets. Stored at sub-minute granularity, it amounts to billions of individual price points, and it is the reference CLV.gg uses to ground its live true-price estimates.

What does Diagest do for that data?

Diagest is the ixprt data-for-AI pipeline. It ingests the raw price feeds, cleans the noise, removes duplicates, aligns everything onto a consistent timeline, and attaches provenance, then writes the result to formats a model or a query can read. Without that step the raw history is too messy and too large to use.

Why does historical betting data need so much processing?

Raw feeds arrive from many sources in different formats, with gaps, duplicates, clock skew, and inconsistent identifiers. At billions of records, those problems compound. Cleaning, deduplication, and time-alignment turn a pile of conflicting snapshots into a single trustworthy record of what each market actually did.

What does the historical database let CLV.gg do?

Three main things: backtest a strategy against the real prices that were available at the time, grade closing line value against real historical closes, and calibrate the live consensus so the current true-price estimate reflects how these markets have actually behaved over years rather than a single live snapshot.

How do Diagest and CLV.gg relate?

Both are built at ixprt. CLV.gg is the live betting-intelligence product; Diagest is the data pipeline underneath it that keeps the historical layer clean and queryable. CLV.gg is one example of Diagest doing its job: raw data in, clean and queryable data out, a real product on top.

Founder at ixprt. Building Diagest, AssetModel, and DailyWallStreet. Based in New York.

Want to skip the work?

Diagest is the data-for-AI pipeline we are building at ixprt, handing AI agents parsed, deduped, embedded source data so the agents stop reinventing ingest.

Contact us now →