LIVE
PHASE 06 PENDING

Market Data That Keeps Its Shape

A compression codec for delayed-feed market archives — compact, exact, queryable · ZPE-FT · PyPI zpe-ft v0.1.1 · github.com/Zer0pa/ZPE-FT

Market archives store rows well. They rarely retain the price pattern. Finding a six-month-old chart shape today means rebuilding from scratch, not retrieving.

ZPE-FT compresses delayed and public feeds 5.9–10.9× smaller than raw, replays price fields at RMSE = 0.0, and runs OHLCV pattern queries up to 62.9× faster than Parquet+zstd through DuckDB. The public-corpus result is real. The enterprise benchmark still waits on Phase 06 inputs and FT-C004 labels.

ZPE-FT approved scientific square mechanics diagram showing delayed-feed market timeseries codec.
Scope: delayed/public archive. Price fields replay exactly; Phase 06 enterprise inputs and FT-C004 labels remain pending.
01 · THE GAPSTORED, NOT KNOWN

A market archive stores what happened. it does not remember what price did.

02 · MARKETSADJACENT FORECASTS
Market data '30$52.1B
Fintech analytics '30$45.6B
Capital markets software '30$31.4B
Time-series database '30$5.7B
Financial data infrastructure '31$78.9B
Capital-markets data and analytics forecasts. Every tool listed above still pays the storage and rebuild cost that ZPE-FT removes from the file itself.
03 · VALUE
$78.9B
Financial data infrastructure keeps growing. The storage and search bill on delayed-feed history is the line item nobody has solved.
04 · INSIGHT

Encode the pattern. The archive knows its shape.

05.1 · CURRENT TECHSTORED AND REBUILT

Delayed market data lives in raw CSV, Parquet+zstd, or vendor stores. Cheap to write, fast to scan. But the file holds bytes, not patterns. Asking what a price did means rebuilding the answer, not retrieving it.

05.2 · OUR TECHENCODE THE PATTERN

ZPE-FT encodes pattern structure into the archive itself. Price fields replay at RMSE 0.0. OHLCV pattern queries run up to 62.9× faster than Parquet+zstd through DuckDB. SPY 10-year: 5.94× smaller. Binance BTC aggTrades: 10.90×. Kaggle SPY full history: 7.31×. Public, delayed-feed corpora only.

05.3 · BENCHMARKSDELAYED-FEED PUBLIC CORPORA
SPY 10y5.94× vs raw
BTC tick10.90× vs raw
Kaggle SPY7.31× vs raw
Price RMSE0.0reported fields
SPY5.94×
BTC10.90×
Kaggle7.31×
Status: three public corpora stand · Phase 06 enterprise benchmark and FT-C004 labels pending.
06 · MEASUREMENTPHASE3 PUBLIC BENCHMARKS

Three public corpora stand. phase 06 still needs its inputs.

06.1 · COMPARATIVE PERFORMANCE · DELAYED-FEED VS RAW
SPY 10y5.94× smaller
BTC aggTrades10.90×
Kaggle SPY7.31×
Phase 06pending
Yahoo SPY 10y, Binance BTCUSDT aggTrades, Kaggle SPY full history — all delayed feed. Reported price fields replay exactly. BTC tick data wins on size but not on query speed; no latency claim is made there. Phase 06 inputs and FT-C004 truth labels remain unresolved.
07 · KEY METRICSDELAYED-FEED CORPORA
07.1 · SPY 10y
5.94×
vs raw · Yahoo Finance daily
07.2 · BTC TICK
10.90×
vs raw · Binance public aggTrades
07.3 · KAGGLE SPY
7.31×
vs raw · Kaggle full history
07.4 · PROXY RMSE
0.0ticks
price fields · public proxy corpus
07.5 · SOVEREIGN
null
Enterprise metric pending · Phase 06 inputs open
08 · FIDELITYPRICE FIELDS VS VOLUME

Price fields replay exactly; zero error decides.

08.1 · WHAT EXACT REPLAY MEANSPUBLIC PROXY SCOPE

The 62.9× figure is the p95 query latency win on Yahoo SPY OHLCV versus Parquet+zstd through DuckDB. BTC aggTrades is size-positive at latency parity or slower — no latency win is claimed on tick data. Price-field RMSE = 0.0 holds on reported fields across all three public corpora. Deterministic replay is declared on public inputs with committed benchmark artifacts in the repo. Anyone with the corpora can rerun the numbers and get the same bytes. Phase 06 enterprise inputs remain missing. FT-C004 retrieval truth labels remain unresolved.

08.2 · HONEST BLOCKER
Honest Blocker ·

Public-corpus benchmarks are not the enterprise benchmark. Phase 06 still needs 33 missing input series and unresolved FT-C004 truth labels. ZPE-FT is not a real-time feed, not a trading system, and makes no lossless volume claim. PyPI ships v0.1.1; v0.1.2 is pending.

09

FIVE FUTURES FROM ONE delayed-feed archive.

09.1 · THE AMBITION

The bet is not to beat the warehouse. The bet is that a market archive can stay compact, exact, and queryable at the same time — and that when delayed-feed history behaves that way, the warehouse stops being the only place a fintech team is allowed to ask questions.

09.2 · WHAT WORKS NOW

Today, on three public corpora: 5.94–10.90× compression, RMSE 0.0 on price fields, 62.9× OHLCV query win.

09.3 · WHAT'S STILL OPEN

Still open: Phase 06 enterprise inputs, FT-C004 retrieval labels, the private-data benchmark, and PyPI v0.1.2.

09.4 · ARCHIVES · NEAR-TERM (12–24 MO)
More history fits in the same budget
A data team that cuts delayed-feed storage by six to eleven times can keep ten years of tick history where they used to keep one. The retention conversation shifts from “what do we drop” to “what do we still ask of it.”
09.5 · QUERIES · NEAR-TERM (12–24 MO)
Pattern search runs on the archive
An analyst hunting a price pattern from three years ago does not stage a fresh DuckDB rebuild first. The query goes against the compressed file. Backtest setup and exploratory research move from hours of preparation toward a single command.
09.6 · FIDELITY · MID-TERM (24–48 MO)
Every price still matches exactly
A compliance reviewer who asks whether the archived close equals the source close gets a zero-difference answer on reported price fields. Compaction stops carrying the usual quiet trust tax, which makes long-horizon market archives easier to defend.
09.7 · TRUTH · MID-TERM (24–48 MO)
Retrieval claims wait for labels
The pending FT-C004 label set is the gate that decides whether “we found this pattern” is allowed to graduate into a product feature. Buyers see retrieval evaluated against a fixed reference, not against the vendor’s own examples.
09.8 · PARADIGM · PARADIGM (48 MO+)
Market archives become query-native
If exact replay and low-latency search stay coupled once enterprise data joins the picture, delayed-feed history stops being cold storage that warehouses must rebuild from. It becomes the searchable layer that fintech analytics sits on top of.