Podcast & Broadcast Audio: LUFS, Dynamics & the Voice Chain
Studio Guide 02 · Cloud Atelier · Updated April 2026 · ~12 min read
Spoken word is its own discipline. The microphone choices are different, the loudness targets are regulated, and the processing chain is built around intelligibility rather than musical beauty. This guide treats podcasting and broadcast voice work as the engineering problem it actually is.
HOW WE RESEARCH · WHAT WE DO NOT CLAIM
Cloud Atelier does not run a test lab. We have not personally A/B tested every microphone, interface or monitor cited in this guide. The physics in this article (RT60, self-noise, polar patterns, latency, LUFS) come from published acoustics literature and standards. The product-specific specifications come from current manufacturer datasheets. Models are mentioned because their published spec satisfies a stated criterion — not because we declared them “best.” Where you see a product below, you will also see the source of the spec we cited and a link to an independent reviewer (Sound on Sound) where you can verify our reading against working engineers.
1. Loudness: LUFS, true peak, and why −16 LUFS exists
Streaming services and broadcasters do not measure loudness in dBFS peak. They measure it in LUFS (Loudness Units relative to Full Scale), a metric standardised in ITU-R BS.1770. The algorithm filters audio through a frequency-weighted curve that approximates human hearing, then averages the resulting power over time. The result correlates with how loud a program actually sounds — not how high its peak sample reaches.
| Platform | Loudness target | True-peak ceiling | Notes |
|---|---|---|---|
| Apple Podcasts | −16 LUFS | −1 dBTP | Mono −19, stereo −16 |
| Spotify (podcasts) | −14 to −16 LUFS | −1 dBTP | Normalised on playback |
| YouTube | −14 LUFS | −1 dBTP | Normalisation, not strict |
| EBU R128 (broadcast) | −23 LUFS | −1 dBTP | EU radio & TV |
| ATSC A/85 (US TV) | −24 LKFS | −2 dBTP | CALM Act compliance |
The two-decibel difference between Apple Podcasts (−16) and Spotify (−14) is meaningful. A podcast mixed at −14 LUFS plays correctly on Spotify and gets normalised down 2 dB on Apple, with no quality loss. A podcast mixed at −19 LUFS plays correctly on broadcast radio but feels weak on Spotify, where listeners reach for the volume knob and the perceived production value drops. −16 LUFS integrated, −1 dBTP true peak is the safe target for both worlds in 2026.
True peak is not the same as sample peak. After a digital signal passes through a D/A converter and is reconstructed into analog, intersample peaks can exceed the highest digital sample by +3 dB or more. A −1 dBTP ceiling protects against intersample clipping in lossy encoders (AAC, MP3) which can otherwise distort on consumer playback even though the WAV file looks clean.
2. Why dynamic microphones dominate spoken word
Walk into any radio station, podcast network, or commercial voiceover booth and the microphone you see is almost certainly dynamic: Shure SM7B, Electro-Voice RE20, Heil PR40, Rode PodMic. Condensers exist in higher-end voice booths (Neumann U87, Sennheiser MKH 416) but only because the booth is treated to anechoic standards. In any other room, the dynamic wins, and the reason is acoustic.
Spoken word is recorded close — typically 5–15 cm from the capsule. At that distance the proximity effect of a cardioid dynamic adds 4–8 dB at 100–200 Hz, producing the warm chest tone listeners associate with broadcast. Simultaneously, the dynamic’s lower sensitivity and tighter polar pattern reject the room reflections that would otherwise smear consonants. The Shure SM7B publishes a particularly tight cardioid pattern with consistent off-axis colouration, which is why it sounds natural even when a host turns their head mid-sentence.
USB-XLR hybrids and the workflow case
For a single host, USB dynamic microphones (Shure MV7+, Rode PodMic USB) record straight to a laptop without an interface. Their internal preamps deliver enough clean gain to drive the dynamic capsule without an outboard Cloudlifter. For multi-host shows the case for XLR returns, because every host needs an isolated track with phantom-power-capable preamps and matched gain — which means an interface with at least the host count + 1 inputs.
3. The standard voice processing chain in series
A professionally produced voice track passes through a fixed sequence of processors. The order matters because each stage modifies what the next stage sees. Reordering the chain changes the result fundamentally.
The canonical voice chain: clean first, shape second, control last.
Notice the EQ appears twice. The first instance is a high-pass filter only — it removes everything below 80–100 Hz before the compressor sees it, so room rumble does not pump the compressor on every plosive. The second instance shapes tone after dynamics control, because compressing a tone-shaped signal re-couples the gain reduction to whatever you boosted.
4. Gate & expander: room rejection in software
A noise gate attenuates signal below a threshold. For a host whose dynamic mic captures a room ambience of −55 dBFS during silences and a speaking voice peaking at −6 dBFS, a gate set to −45 dBFS with 2–5 ms attack and 100–200 ms release silences breaths and HVAC between sentences without truncating consonants. Set the threshold by listening: just above the noise floor, just below the quietest intentional whisper.
Expanders are gentler. Where a gate goes from full level to silence at the threshold, an expander reduces level proportionally: a 2:1 downward expander at −40 dBFS lets −50 dBFS material come through at −60 dBFS, smoothing the transition. For dialogue where breaths are part of the delivery, expansion preserves life. For panel discussion with crosstalk, gating wins.
5. Compressor: ratio, threshold, attack, release
A compressor reduces dynamic range by attenuating signal above a threshold by a chosen ratio. For voice, the canonical settings are:
- Threshold: set so the compressor engages on average speech, not just shouts. Aim for 4–8 dB of gain reduction on the loudest syllables.
- Ratio: 2:1 to 4:1. Higher ratios start to sound squashed.
- Attack: 5–20 ms. Faster attack catches consonants but kills transient clarity. Slower attack preserves clarity but lets peaks through.
- Release: 50–150 ms for natural-sounding speech. Faster release pumps; slower release ducks.
- Knee: soft knee (3–10 dB) for transparent voice work; hard knee for drum-style aggression that does not belong on dialogue.
Two compressors in series at gentler settings (2 dB and 3 dB of reduction respectively, total 5 dB) sound more transparent than a single compressor doing 5 dB. This is called serial compression and is the broadcast default for hosts whose dynamic range exceeds 30 dB between mumble and shout.
6. EQ: the broadcast voice curve
The voice EQ curve is genre-stable enough to write down. After the high-pass at 80–100 Hz, you typically apply:
- Low-mid cut at 200–400 Hz, −2 to −4 dB with a wide Q (1.0–1.4): removes the boxy quality of small rooms.
- Presence boost at 3–5 kHz, +1 to +3 dB: adds intelligibility on consumer earbuds and laptop speakers.
- Air shelf above 10 kHz, +1 to +2 dB: opens the top end on condenser tracks; usually skipped on dynamic mics.
Every voice differs — a male host with chesty resonance might need 3 dB cut at 250 Hz; a female host with sibilant brightness might need none. The curve above is a starting point, not a recipe.
7. De-esser: 5–9 kHz, sibilance physics
Sibilance is the energy concentration produced by “s,” “sh,” and “t” consonants, typically peaking between 5 and 9 kHz. Compression in that band cannot be performed with a broadband compressor without dulling the entire voice on every sibilant moment. A de-esser is a frequency-selective compressor: it triggers only when energy in the chosen band crosses a threshold and attenuates only that band.
Find the offending frequency by sweeping a narrow EQ boost between 5 and 9 kHz until the sibilance becomes painful, then set the de-esser at that centre. Threshold should engage 3–6 dB of attenuation on the harshest “s” only. Heavier de-essing introduces a lisp.
8. Limiter: the last line of defense
A limiter is a compressor with a ratio of approximately infinity-to-one and very fast attack. Its job is to ensure no sample exceeds the ceiling regardless of what reaches it. For podcast delivery, a brick-wall limiter at −1 dBTP true peak with 1–2 dB of catching gain reduction prevents intersample clipping on lossy encoders.
Loudness compliance is a two-step process. First, gain-stage the limiter input so that the integrated LUFS reading sits at the target (e.g. −16). Second, the limiter catches the final 1–2 dB of true-peak excursion. Do not push the limiter to do more than that; if you need 6 dB of limiting to hit target loudness, your compression earlier in the chain is undersized.
9. Multi-host recording: double-enders, sample lock, drift
Remote interviews are typically recorded as double-enders: each host records their own microphone locally to disk while talking to the others over a real-time call (Zoom, Riverside, SquadCast). Each local file is uploaded to the editor, who aligns them on a multitrack timeline.
Two technical pitfalls. Sample-rate mismatch: if one host records at 44.1 kHz and another at 48 kHz, alignment drifts at 8.8 percent, audibly out of sync within ten seconds. Lock all hosts to the same rate. Clock drift: even at the same sample rate, two laptop crystal oscillators drift slightly — perhaps 50 ms over an hour. A waveform-aligning tool (Riverside’s magic edit, Descript, or manual alignment with a hand-clap reference at the start of every session) corrects this.
SUMMARY
Spoken-word audio is solved engineering. Use a dynamic microphone for any room you have not paid an acoustician to fix. Pass it through gate, high-pass, compressor, de-esser, tone EQ, limiter in that order. Target −16 LUFS integrated and −1 dBTP for podcasts. Lock sample rates across hosts. The problems amateurs have are almost always violations of one of those rules — not microphone choice.
EQUIPMENT THAT MEETS THE CRITERIA · PODCAST & BROADCAST
Models below are grouped by the physical criterion they satisfy. We list the spec source (manufacturer datasheet) and a link to an independent reviewer (Sound on Sound) so you can verify our reading against working engineers. We did not personally A/B test these models.
Criterion: End-address dynamic with built-in pop / wind protection, low handling noise
For long-form spoken word in a room you cannot acoustically treat (a closet, a corner, a hotel room). Dynamic capsule rejects HVAC and computer fan noise by ~10 dB vs. a condenser at the same gain.
Criterion: USB+XLR hybrid, plug-and-play for solo hosts who may later add an interface
Lets you start with a USB cable into a laptop, then later move to a multi-mic XLR rig without replacing the microphone. Sample rate at the microphone is not the limiting factor for spoken word.
Criterion: Inline gain booster, when the dynamic + interface combination cannot reach broadcast level
+25 dB of clean gain ahead of the interface preamp. Particularly relevant when running an SM7B on an interface published below 60 dB of gain.
Criterion: Multi-input USB interface for two-host or three-host XLR setups
Two front-panel mic inputs are not enough for three-host shows; the interface below ships with four mic preamps and discrete monitor outputs.