github.com/Zer0pa/ZPE-ProsodyRESEARCH-READY

LIVE

RETRIEVAL OPEN

Encoding Speech's Shape And Feeling

Prosodic feature-store codec — pitch, energy, duration, voiced mask · ZPE-Prosody · PyPI zpe-prosody v0.1.1 · github.com/Zer0pa/ZPE-Prosody

A voice carries more than words. Pitch rises, stress lands, and rhythm marks how speech moves through time.

ZPE-Prosody captures that shape as a deterministic ZPRS/v1 stream — F0, energy, duration, and the voiced/unvoiced mask — at 13.0× mean compression and 0.64% voiced-F0 RMSE on 100 LibriSpeech test-clean utterances. It stores acoustic prosody cues, not emotion, intent, semantic meaning, or a speaker-state diagnosis. Encoder only: retrieval misses target; transfer is paused.

ZPE-Prosody approved scientific square mechanics diagram showing ZPRS prosody stream mechanics. — **Scope:** encoder stream for F0, energy, duration, and voiced mask. Retrieval misses target; transfer remains paused.

01 · THE GAPCOMPUTED, NOT KEPT

Speech systems compute prosody again and again. the shape of the voice is rarely kept.

02 · MARKETSADJACENT FORECASTS

Speech and language processing '30$26.8B

Text-to-speech market '31$7.9B

Text-to-speech software '30$7.3B

Voice analytics '30est. $3.1B

Speech AI / feature-store tooling '30est. $1.8B

Adjacent forecasts only · ZPE-Prosody is a bounded prosodic encoder; retrieval and transfer are not claimed.

03 · VALUE

$7.3B

TTS market by 2030; the prosodic feature store beneath it, with the retrieval gap stated.

04 · INSIGHT

Speech carries feeling. its shape can now be held.

05.1 · CURRENT TECHCOMPUTED AND DISCARDED

Mainstream TTS and voice-analytics stacks compute pitch, energy and timing every time they need them, then throw the contours away or stash them as undocumented bytes. No published fidelity figure, no public limit, no shared archive format.

05.2 · OUR TECHTHE SHAPE, HELD

ZPE-Prosody encodes the four prosodic primitives — F0, energy, duration, voiced mask — as a deterministic ZPRS/v1 stream at 13.0× mean compression and 0.64% voiced-F0 RMSE on real LibriSpeech utterances, with mean encode latency of 2.67 ms. Four primitive checks pass. Retrieval and transfer are excluded from the product on purpose, with the numbers.

05.3 · BENCHMARKSLIBRISPEECH TEST-CLEAN

Compression13.0×

F0 RMSE0.64%

Primitive4/4PASS

RetrievalMISSp@5 0.31

Encoder 13.0×PASS

Fidelity 0.64%PASS

Retrieval 0.31MISS

Scope: 100 LibriSpeech test-clean utterances. PRO-C006 retrieval MISS; PRO-C005 transfer PAUSED_EXTERNAL.

06 · MEASUREMENTPRO CHECK SUITE

The encoder passes four checks. retrieval and transfer do not.

06.1 · COMPARATIVE PERFORMANCE · LIBRISPEECH CONTOUR COMPRESSION

ZPE-Prosody13.0× compression

gzip~2.2× raw

PRO-C006 p@50.31 MISS

PRO-C004PASS

100 LibriSpeech test-clean utterances. The four primitive encoder checks pass. Retrieval misses at p@5 0.31 vs 0.80; OOD p@5 0.1707. Transfer is paused; no commercial-safe substitute proven in-lane.

07 · KEY METRICSLIBRISPEECH TEST-CLEAN

07.1 · F0 RMSE

0.64%

Voiced frames · LibriSpeech 100 utterances

07.2 · COMPRESSION

13.0×

Mean vs raw float32 · ZPRS/v1 stream

07.3 · PRIMITIVE CHECKS

4 / 4PASS

PRO-C001..C004 only · retrieval open

07.4 · CORPUS

100utt

LibriSpeech test-clean · OpenSLR

07.5 · RETRIEVAL TARGET

0.31p@5

PRO-C006 MISS · vs 0.80 threshold

08 · ENCODER BOUNDSWHAT HOLDS, WHAT MISSES

The encoder holds speech's shape. retrieval does not yet follow.

08.1 · WHAT ROUND-TRIPS EXACTLYZPRS/V1 PRIMITIVE

On 100 LibriSpeech test-clean utterances the encoder records 13.0× mean compression at 0.64% voiced-F0 RMSE with duration RMSE of 0.000 ms, across 5/5 hash-identical encoder runs. The same input bytes produce the same ZPRS/v1 stream every time, on every host. PRO-C001..C004 PASS on primitive encoder checks; they do not override the retrieval and transfer gates. Retrieval (PRO-C006) misses target at p@5 0.31 vs 0.80; OOD p@5 0.1707. Transfer (PRO-C005) is PAUSED_EXTERNAL. The page reports both, not one.

08.2 · HONEST BLOCKER

Honest Blocker ·

MISS on PRO-C006 retrieval, p@5 0.31 vs 0.80; OOD p@5 0.1707. PRO-C005 transfer PAUSED_EXTERNAL; no commercial-safe substitute proven in-lane. Status packet on PR #50 branch-public; PyPI stale at v0.1.1. No transfer learning, retrieval product, or TTS-ready system is claimed.

09

A voice carries a fidelity receipt.

09.1 · THE AMBITION

The product is a bounded ZPRS/v1 feature store for the shape of speech — F0, energy, duration, voiced mask — that a TTS team, a call-centre analytics owner or a linguistics lab can store, ship and re-read with a stated fidelity per recording. Retrieval and transfer arrive later, on their own terms.

09.2 · WHAT WORKS NOW

The prosodic encoder ships fidelity per frame plus a public compression figure.

09.3 · WHAT'S STILL OPEN

Retrieval misses target at p@5 0.31 vs 0.80. transfer is paused on an external dependency.

09.4 · FEATURE STORES · NEAR-TERM (12–24 MO)

TTS teams stop drowning in contour bytes

A TTS platform keeping pitch and energy contours for thousands of speaker voices and styles cuts feature-store storage by roughly 87% against its current gzip baseline. The same archive holds many more voices on the same disk.

09.5 · FIDELITY · NEAR-TERM (12–24 MO)

Voice pipelines inherit a pitch receipt

A voice-cloning engineer who round-trips a speaker through the codec sees the F0 error per utterance — 0.64% on LibriSpeech — before the model ever ingests the contour. Pitch drift becomes a number on a dashboard, not a complaint from a listener.

09.6 · CALL CENTRES · MID-TERM (24–48 MO)

Analytics vendors archive prosody, not just transcripts

A call-centre analytics platform that already stores transcripts can store the prosody beside them at a tractable cost. Emotion-AI and sentiment systems get to work from the actual shape of how a customer spoke, not a downstream summary of it.

09.7 · LINGUISTICS · MID-TERM (24–48 MO)

Prosody corpora become comparable

A linguistics lab studying stress and intonation across dialects can compress a multi-year recording corpus into a portable feature store with a stated pitch error. A peer at another institution can reproduce the analysis on the same bytes, not on a re-derived contour.

09.8 · DISCLOSURE · PARADIGM (48 MO+)

Speech feature codecs get fidelity terms

A market in which prosodic codecs publish compression, F0 RMSE, and the retrieval limit side by side changes how buyers procure speech tooling. A TTS vendor talks to a regulator and a customer with the same numbers, in the same units, against the same corpus.

01 // WHAT THIS IS

02 // CODEC MECHANICS

03 // KEY METRICS

04 // REPO IDENTITY

05 // READINESS

06 // WHAT WE PROVE

07 // WHAT WE DON'T CLAIM

08 // VERIFICATION STATUS

09 // PROOF ANCHORS

10 // REPO SHAPE