Mini RFC: Audio API v1 (PCM-first + explicit low-level API)¶

Status: Draft (for approval)

Related issues: #1, #2, #3, #4, #5, #6, #7, #8, #10
Already in progress: #9 via PR #11

1) Problem¶

Current audio surface mixes low-level Opus-oriented operations and user-facing PCM workflows. This makes API naming ambiguous, CLI scope unclear, and recovery/testing behavior under-specified.

2) Goals¶

Provide a PCM-first high-level API for common workflows.
Keep a clear explicit low-level Opus API for advanced users.
Define deterministic defaults and capability introspection.
Add runtime stats and predictable recovery semantics.
Preserve backward compatibility with a deprecation window.

3) Non-goals¶

No protocol redesign.
No breaking changes in one release.
No broad refactor outside audio/CLI/test/doc scope.

4) Decisions¶

D1. API split: explicit low-level vs high-level¶

Low-level (Opus, explicit naming)¶

start_audio_rx_opus(callback, *, jitter_depth=5)
stop_audio_rx_opus()
start_audio_tx_opus()
push_audio_tx_opus(opus_bytes)
stop_audio_tx_opus()

High-level (PCM)¶

start_audio_rx_pcm(callback, *, sample_rate=48000, channels=1, frame_ms=20)
stop_audio_rx_pcm()
start_audio_tx_pcm(*, sample_rate=48000, channels=1, frame_ms=20)
push_audio_tx_pcm(pcm_bytes)
stop_audio_tx_pcm()

D2. Backward compatibility + deprecation¶

Keep current methods (start_audio_rx, start_audio_tx, push_audio_tx, etc.) as aliases to low-level behavior.
Emit DeprecationWarning with replacement hints.
Deprecation window: two minor releases.
Remove ambiguous aliases at next major.

D3. Internal transcoder layer¶

Add internal PCM<->Opus transcoder abstraction.
Use opuslib when available (pip install rigplane[audio]).
If missing, high-level PCM APIs raise actionable error: Audio codec backend unavailable; install rigplane[audio].

D4. Capabilities model + deterministic defaults¶

Add AudioCaps model and API: - get_audio_caps() returns: - supported codecs - supported sample rates - supported channel counts - default codec/rate/channels

CLI: - rigplane audio caps [--json]

Default selection (deterministic): 1. Prefer Opus mono 48k 2. Else Opus stereo 48k 3. Else best PCM mode 4. If no valid combo -> clear validation error

D5. Runtime stats contract¶

Add get_audio_stats() shape (JSON-friendly): - packet_loss_pct (0..100) - jitter_ms - underruns - overruns - est_latency_ms - rx_packets, tx_packets

Stats availability: during active stream and for a short terminal window after stop.

D6. Recovery model¶

Optional auto-recover after reconnect: - config: auto_recover=True, recover_max_attempts=5 - state events: recovering, recovered, failed - single active stream invariant (no duplicate RX/TX tasks)

D7. CLI scope¶

Add rigplane audio command group: - audio rx --out rx.wav --seconds 10 - audio tx --in tx.wav - audio loopback --seconds 10 - audio caps

Common flags: - --sample-rate, --channels, --json, --stats

5) Delivery plan (by issue dependency)¶

Phase A (foundation)¶

1 Transcoder layer¶
10 Naming + deprecation map¶

Phase B (public APIs)¶

2 RX high-level PCM API¶
3 TX high-level PCM API¶
8 Capability introspection + defaults¶

Phase C (CLI + observability)¶

4 CLI audio subcommands¶
6 Runtime stats API + --stats¶

Phase D (resilience + CI)¶

7 Auto-recover behavior¶
5 E2E + CI stabilization¶

6) Test strategy¶

Unit tests for transcoder adapters and validation.
Integration tests for RX/TX/loopback (normal + reconnect).
CLI smoke tests for all new subcommands.
CI split:
Fast smoke on each PR
Heavier integration profile on schedule/manual

7) Risks¶

Opus backend differences across platforms.
Reconnect race conditions in async tasks.
Stats drift if metric units are not fixed in docs/tests.

Mitigation: - strict typed errors, - state machine invariants, - unit-tested metric contract and fixtures.

8) Acceptance mapping¶

1/#2/#3: high-level PCM APIs operational and tested¶
4: CLI audio workflow commands work end-to-end¶
5: CI smoke/integration split stable¶
6: get_audio_stats() + CLI stats output¶
7: reconnect recovery behavior deterministic¶
8: audio caps + safe defaults¶
10: naming map + deprecation warnings + migration notes¶

9) Open questions for maintainer approval¶

Deprecation window length: exactly 2 minor releases OK?
Keep current ambiguous names as low-level aliases or add hard warnings immediately?
Should --stats print periodic stream stats (live) or end-of-run summary by default?
Is opuslib optional dependency acceptable as the default high-level backend?

10) DSP pipeline + PCM tap gate on Opus-native radios (issue #762)¶

Behavior: the web audio broadcaster's DSP pipeline (noise gate, limiter, etc.) and the PCM tap registry (used by the FFT / waterfall scope and audio analyzers) both operate on decoded PCM16. When the radio's native audio codec is Opus (IC-705 and any future Opus-only model), the broadcaster passes the Opus frame through without decoding, so DSP and taps do not run.

Why we don't decode + re-encode on the hot path: Opus re-encode would introduce quality loss on every frame. Users of IC-705 have not reported needing DSP or scope through the web UI, so the gate is documented rather than closed.

Observability: the broadcaster emits a one-shot WARNING log entry when it detects an active DSP pipeline on an Opus-native codec — fires at set_dsp_pipeline() or at _refresh_codec_state() (in case the codec flips mid-stream), whichever happens first.

Upgrade path if demand arrives: issue #762 §"Option A" — decode Opus once in _relay_loop, run DSP + feed taps on the PCM buffer, then re-encode before fan-out. _audio_transcoder.PcmOpusTranscoder already exists and can be reused. Quality loss is negligible for AM/FM ham audio but non-zero on SSB; flag behind a config toggle if implemented.

11) LAN MAIN/SUB audio routing (epic #787)¶

Wire format. On dual-RX radios the LAN audio stream is stereo PCM16 with L=MAIN and R=SUB whenever a 2-channel codec is negotiated. _DEFAULT_CODEC_PREFERENCE in types.py leads with PCM_2CH_16BIT (0x10); single-RX firmware downgrades to mono during handshake, and the broadcaster's _refresh_codec_state reads the negotiated codec back so downstream logic tracks reality rather than the requested codec.

Phones L/R Mix is always OFF. CI-V 0x1A 05 00 72 is a boolean toggle — 0x00 = Mix OFF (separated stereo), 0x01 = Mix ON (summed to both channels). The backend keeps it locked at 0x00 via two paths:

AudioHandler._handle_audio_config — every audio_config WS message emits 0x00, independent of the split_stereo payload.
AudioBroadcaster._apply_phones_mix_off — fires once on every relay start (guarded on receiver_count >= 2) so the LAN stream begins in separated-stereo state regardless of the radio's prior menu state. Errors are swallowed so start-up continues on radios where the command is unsupported.

If Mix were ON the radio would pre-sum MAIN + SUB before transmission and the frontend graph could no longer isolate a single receiver — focus=main and focus=sub would both play the summed signal.

focus and split_stereo live on the frontend. The WebAudio graph in frontend/src/lib/audio/rx-player.ts routes the stereo pair through ChannelSplitter(2) → GainNode×2 → StereoPanner×2 → destination:

`focus`	`split_stereo`	L gain	R gain	L pan	R pan
main	false	1.0	0.0	0	0
main	true	1.0	0.0	−1	+1
sub	false	0.0	1.0	0	0
sub	true	0.0	1.0	−1	+1
both	false	1.0	1.0	0	0
both	true	1.0	1.0	−1	+1

Panning depends only on split_stereo (see rx-player.ts::_applyGraphState, lines 247-257) — when on, mainPanner hard-pans to −1 and subPanner to +1 regardless of focus. For single-receiver focus (focus=main or focus=sub) the opposing gain is already 0 so the pan on the silenced channel is inaudible, but the setting still applies to the active channel — e.g. focus=main + split_stereo=true routes MAIN to the left ear only.

Per-channel dB sliders (mainGainDb / subGainDb) multiply into the L/R gains respectively.

Why the WS message still exists. audio_config is retained as a bidirectional echo channel so the client can persist its focus + split_stereo choice and the backend confirms via applied: true. The CI-V round-trip it used to drive (the broken _PHONES_LR_MIX dict pre-#788) is gone — the message now only raises the broadcaster's _codec_stale flag so a mid-stream codec/channel change triggers a fresh _refresh_codec_state pass (issue #766).

Historical context. Revisions before #788 sent 0x02 / 0x03 on this sub-command for focus=sub / focus=both — values the radio silently ignored per the CI-V reference (only {0x00, 0x01} valid).

788 briefly tied the byte to `split_stereo`; #792 corrected that to¶

the lock-at-0x00 contract above. The dead-code mapping is documented here rather than in a deleted source comment so future contributors don't rediscover the same trap.

Mini RFC: Audio API v1 (PCM-first + explicit low-level API)¶

1) Problem¶

2) Goals¶

3) Non-goals¶

4) Decisions¶

D1. API split: explicit low-level vs high-level¶

Low-level (Opus, explicit naming)¶

High-level (PCM)¶

D2. Backward compatibility + deprecation¶

D3. Internal transcoder layer¶

D4. Capabilities model + deterministic defaults¶

D5. Runtime stats contract¶

D6. Recovery model¶

D7. CLI scope¶

5) Delivery plan (by issue dependency)¶

Phase A (foundation)¶

1 Transcoder layer¶

10 Naming + deprecation map¶

Phase B (public APIs)¶

2 RX high-level PCM API¶

3 TX high-level PCM API¶

8 Capability introspection + defaults¶

Phase C (CLI + observability)¶

4 CLI audio subcommands¶

6 Runtime stats API + --stats¶

Phase D (resilience + CI)¶

7 Auto-recover behavior¶

5 E2E + CI stabilization¶

6) Test strategy¶

7) Risks¶

8) Acceptance mapping¶

1/#2/#3: high-level PCM APIs operational and tested¶

4: CLI audio workflow commands work end-to-end¶

5: CI smoke/integration split stable¶

6: get_audio_stats() + CLI stats output¶

7: reconnect recovery behavior deterministic¶

8: audio caps + safe defaults¶

10: naming map + deprecation warnings + migration notes¶

9) Open questions for maintainer approval¶

10) DSP pipeline + PCM tap gate on Opus-native radios (issue #762)¶

11) LAN MAIN/SUB audio routing (epic #787)¶

788 briefly tied the byte to split_stereo; #792 corrected that to¶

6 Runtime stats API + `--stats`¶

6: `get_audio_stats()` + CLI stats output¶

8: `audio caps` + safe defaults¶

788 briefly tied the byte to `split_stereo`; #792 corrected that to¶