LIVE
RETRIEVAL OPEN

Encoding Speech's Shape And Feeling

Prosodic feature-store codec — pitch, energy, duration, voiced mask · ZPE-Prosody · PyPI zpe-prosody v0.1.1 · github.com/Zer0pa/ZPE-Prosody

A voice carries more than words. Pitch rises, stress lands, and rhythm marks how speech moves through time.

ZPE-Prosody captures that shape as a deterministic ZPRS/v1 stream — F0, energy, duration, and the voiced/unvoiced mask — at 13.0× mean compression and 0.64% voiced-F0 RMSE on 100 LibriSpeech test-clean utterances. It stores acoustic prosody cues, not emotion, intent, semantic meaning, or a speaker-state diagnosis. Encoder only: retrieval misses target; transfer is paused.

ZPE-Prosody approved scientific square mechanics diagram showing ZPRS prosody stream mechanics.
Scope: encoder stream for F0, energy, duration, and voiced mask. Retrieval misses target; transfer remains paused.
01 · THE GAPCOMPUTED, NOT KEPT

Speech systems compute prosody again and again. the shape of the voice is rarely kept.

02 · MARKETSADJACENT FORECASTS
Speech and language processing '30$26.8B
Text-to-speech market '31$7.9B
Text-to-speech software '30$7.3B
Voice analytics '30est. $3.1B
Speech AI / feature-store tooling '30est. $1.8B
Adjacent forecasts only · ZPE-Prosody is a bounded prosodic encoder; retrieval and transfer are not claimed.
03 · VALUE
$7.3B
TTS market by 2030; the prosodic feature store beneath it, with the retrieval gap stated.
04 · INSIGHT

Speech carries feeling. its shape can now be held.

05.1 · CURRENT TECHCOMPUTED AND DISCARDED

Mainstream TTS and voice-analytics stacks compute pitch, energy and timing every time they need them, then throw the contours away or stash them as undocumented bytes. No published fidelity figure, no public limit, no shared archive format.

05.2 · OUR TECHTHE SHAPE, HELD

ZPE-Prosody encodes the four prosodic primitives — F0, energy, duration, voiced mask — as a deterministic ZPRS/v1 stream at 13.0× mean compression and 0.64% voiced-F0 RMSE on real LibriSpeech utterances, with mean encode latency of 2.67 ms. Four primitive checks pass. Retrieval and transfer are excluded from the product on purpose, with the numbers.

05.3 · BENCHMARKSLIBRISPEECH TEST-CLEAN
Compression13.0×
F0 RMSE0.64%
Primitive4/4PASS
RetrievalMISSp@5 0.31
Encoder 13.0×PASS
Fidelity 0.64%PASS
Retrieval 0.31MISS
Scope: 100 LibriSpeech test-clean utterances. PRO-C006 retrieval MISS; PRO-C005 transfer PAUSED_EXTERNAL.
06 · MEASUREMENTPRO CHECK SUITE

The encoder passes four checks. retrieval and transfer do not.

06.1 · COMPARATIVE PERFORMANCE · LIBRISPEECH CONTOUR COMPRESSION
ZPE-Prosody13.0× compression
gzip~2.2× raw
PRO-C006 p@50.31 MISS
PRO-C004PASS
100 LibriSpeech test-clean utterances. The four primitive encoder checks pass. Retrieval misses at p@5 0.31 vs 0.80; OOD p@5 0.1707. Transfer is paused; no commercial-safe substitute proven in-lane.
07 · KEY METRICSLIBRISPEECH TEST-CLEAN
07.1 · F0 RMSE
0.64%
Voiced frames · LibriSpeech 100 utterances
07.2 · COMPRESSION
13.0×
Mean vs raw float32 · ZPRS/v1 stream
07.3 · PRIMITIVE CHECKS
4 / 4PASS
PRO-C001..C004 only · retrieval open
07.4 · CORPUS
100utt
LibriSpeech test-clean · OpenSLR
07.5 · RETRIEVAL TARGET
0.31p@5
PRO-C006 MISS · vs 0.80 threshold
08 · ENCODER BOUNDSWHAT HOLDS, WHAT MISSES

The encoder holds speech's shape. retrieval does not yet follow.

08.1 · WHAT ROUND-TRIPS EXACTLYZPRS/V1 PRIMITIVE

On 100 LibriSpeech test-clean utterances the encoder records 13.0× mean compression at 0.64% voiced-F0 RMSE with duration RMSE of 0.000 ms, across 5/5 hash-identical encoder runs. The same input bytes produce the same ZPRS/v1 stream every time, on every host. PRO-C001..C004 PASS on primitive encoder checks; they do not override the retrieval and transfer gates. Retrieval (PRO-C006) misses target at p@5 0.31 vs 0.80; OOD p@5 0.1707. Transfer (PRO-C005) is PAUSED_EXTERNAL. The page reports both, not one.

08.2 · HONEST BLOCKER
Honest Blocker ·

MISS on PRO-C006 retrieval, p@5 0.31 vs 0.80; OOD p@5 0.1707. PRO-C005 transfer PAUSED_EXTERNAL; no commercial-safe substitute proven in-lane. Status packet on PR #50 branch-public; PyPI stale at v0.1.1. No transfer learning, retrieval product, or TTS-ready system is claimed.

09

A voice carries a fidelity receipt.

09.1 · THE AMBITION

The product is a bounded ZPRS/v1 feature store for the shape of speech — F0, energy, duration, voiced mask — that a TTS team, a call-centre analytics owner or a linguistics lab can store, ship and re-read with a stated fidelity per recording. Retrieval and transfer arrive later, on their own terms.

09.2 · WHAT WORKS NOW

The prosodic encoder ships fidelity per frame plus a public compression figure.

09.3 · WHAT'S STILL OPEN

Retrieval misses target at p@5 0.31 vs 0.80. transfer is paused on an external dependency.

09.4 · FEATURE STORES · NEAR-TERM (12–24 MO)
TTS teams stop drowning in contour bytes
A TTS platform keeping pitch and energy contours for thousands of speaker voices and styles cuts feature-store storage by roughly 87% against its current gzip baseline. The same archive holds many more voices on the same disk.
09.5 · FIDELITY · NEAR-TERM (12–24 MO)
Voice pipelines inherit a pitch receipt
A voice-cloning engineer who round-trips a speaker through the codec sees the F0 error per utterance — 0.64% on LibriSpeech — before the model ever ingests the contour. Pitch drift becomes a number on a dashboard, not a complaint from a listener.
09.6 · CALL CENTRES · MID-TERM (24–48 MO)
Analytics vendors archive prosody, not just transcripts
A call-centre analytics platform that already stores transcripts can store the prosody beside them at a tractable cost. Emotion-AI and sentiment systems get to work from the actual shape of how a customer spoke, not a downstream summary of it.
09.7 · LINGUISTICS · MID-TERM (24–48 MO)
Prosody corpora become comparable
A linguistics lab studying stress and intonation across dialects can compress a multi-year recording corpus into a portable feature store with a stated pitch error. A peer at another institution can reproduce the analysis on the same bytes, not on a re-derived contour.
09.8 · DISCLOSURE · PARADIGM (48 MO+)
Speech feature codecs get fidelity terms
A market in which prosodic codecs publish compression, F0 RMSE, and the retrieval limit side by side changes how buyers procure speech tooling. A TTS vendor talks to a regulator and a customer with the same numbers, in the same units, against the same corpus.