Podcast & Broadcast Audio: LUFS, Dynamics & the Voice Chain

Studio Guide 02 · Cloud Atelier · Updated April 2026 · ~12 min read

Spoken word is its own discipline. The microphone choices are different, the loudness targets are regulated, and the processing chain is built around intelligibility rather than musical beauty. This guide treats podcasting and broadcast voice work as the engineering problem it actually is.

HOW WE RESEARCH · WHAT WE DO NOT CLAIM

Cloud Atelier does not run a test lab. We have not personally A/B tested every microphone, interface or monitor cited in this guide. The physics in this article (RT60, self-noise, polar patterns, latency, LUFS) come from published acoustics literature and standards. The product-specific specifications come from current manufacturer datasheets. Models are mentioned because their published spec satisfies a stated criterion — not because we declared them “best.” Where you see a product below, you will also see the source of the spec we cited and a link to an independent reviewer (Sound on Sound) where you can verify our reading against working engineers.

1. Loudness: LUFS, true peak, and why −16 LUFS exists

Streaming services and broadcasters do not measure loudness in dBFS peak. They measure it in LUFS (Loudness Units relative to Full Scale), a metric standardised in ITU-R BS.1770. The algorithm filters audio through a frequency-weighted curve that approximates human hearing, then averages the resulting power over time. The result correlates with how loud a program actually sounds — not how high its peak sample reaches.

PlatformLoudness targetTrue-peak ceilingNotes
Apple Podcasts−16 LUFS−1 dBTPMono −19, stereo −16
Spotify (podcasts)−14 to −16 LUFS−1 dBTPNormalised on playback
YouTube−14 LUFS−1 dBTPNormalisation, not strict
EBU R128 (broadcast)−23 LUFS−1 dBTPEU radio & TV
ATSC A/85 (US TV)−24 LKFS−2 dBTPCALM Act compliance

The two-decibel difference between Apple Podcasts (−16) and Spotify (−14) is meaningful. A podcast mixed at −14 LUFS plays correctly on Spotify and gets normalised down 2 dB on Apple, with no quality loss. A podcast mixed at −19 LUFS plays correctly on broadcast radio but feels weak on Spotify, where listeners reach for the volume knob and the perceived production value drops. −16 LUFS integrated, −1 dBTP true peak is the safe target for both worlds in 2026.

True peak is not the same as sample peak. After a digital signal passes through a D/A converter and is reconstructed into analog, intersample peaks can exceed the highest digital sample by +3 dB or more. A −1 dBTP ceiling protects against intersample clipping in lossy encoders (AAC, MP3) which can otherwise distort on consumer playback even though the WAV file looks clean.

METER YOUR MIX
Free meters that respect ITU-R BS.1770: Youlean Loudness Meter 2, dpMeter5, NUGEN VisLM SE, Voxengo SPAN Plus. Hit the integrated number across the whole episode, not just the loud parts. Short loudness should sit within ±3 LU of integrated for a comfortable listen.

2. Why dynamic microphones dominate spoken word

Walk into any radio station, podcast network, or commercial voiceover booth and the microphone you see is almost certainly dynamic: Shure SM7B, Electro-Voice RE20, Heil PR40, Rode PodMic. Condensers exist in higher-end voice booths (Neumann U87, Sennheiser MKH 416) but only because the booth is treated to anechoic standards. In any other room, the dynamic wins, and the reason is acoustic.

Spoken word is recorded close — typically 5–15 cm from the capsule. At that distance the proximity effect of a cardioid dynamic adds 4–8 dB at 100–200 Hz, producing the warm chest tone listeners associate with broadcast. Simultaneously, the dynamic’s lower sensitivity and tighter polar pattern reject the room reflections that would otherwise smear consonants. The Shure SM7B publishes a particularly tight cardioid pattern with consistent off-axis colouration, which is why it sounds natural even when a host turns their head mid-sentence.

USB-XLR hybrids and the workflow case

For a single host, USB dynamic microphones (Shure MV7+, Rode PodMic USB) record straight to a laptop without an interface. Their internal preamps deliver enough clean gain to drive the dynamic capsule without an outboard Cloudlifter. For multi-host shows the case for XLR returns, because every host needs an isolated track with phantom-power-capable preamps and matched gain — which means an interface with at least the host count + 1 inputs.

3. The standard voice processing chain in series

A professionally produced voice track passes through a fixed sequence of processors. The order matters because each stage modifies what the next stage sees. Reordering the chain changes the result fundamentally.

GATE EQ (HPF) COMP DE-ESS EQ (TONE) LIMIT

The canonical voice chain: clean first, shape second, control last.

Notice the EQ appears twice. The first instance is a high-pass filter only — it removes everything below 80–100 Hz before the compressor sees it, so room rumble does not pump the compressor on every plosive. The second instance shapes tone after dynamics control, because compressing a tone-shaped signal re-couples the gain reduction to whatever you boosted.

4. Gate & expander: room rejection in software

A noise gate attenuates signal below a threshold. For a host whose dynamic mic captures a room ambience of −55 dBFS during silences and a speaking voice peaking at −6 dBFS, a gate set to −45 dBFS with 2–5 ms attack and 100–200 ms release silences breaths and HVAC between sentences without truncating consonants. Set the threshold by listening: just above the noise floor, just below the quietest intentional whisper.

Expanders are gentler. Where a gate goes from full level to silence at the threshold, an expander reduces level proportionally: a 2:1 downward expander at −40 dBFS lets −50 dBFS material come through at −60 dBFS, smoothing the transition. For dialogue where breaths are part of the delivery, expansion preserves life. For panel discussion with crosstalk, gating wins.

5. Compressor: ratio, threshold, attack, release

A compressor reduces dynamic range by attenuating signal above a threshold by a chosen ratio. For voice, the canonical settings are:

Two compressors in series at gentler settings (2 dB and 3 dB of reduction respectively, total 5 dB) sound more transparent than a single compressor doing 5 dB. This is called serial compression and is the broadcast default for hosts whose dynamic range exceeds 30 dB between mumble and shout.

6. EQ: the broadcast voice curve

The voice EQ curve is genre-stable enough to write down. After the high-pass at 80–100 Hz, you typically apply:

Every voice differs — a male host with chesty resonance might need 3 dB cut at 250 Hz; a female host with sibilant brightness might need none. The curve above is a starting point, not a recipe.

7. De-esser: 5–9 kHz, sibilance physics

Sibilance is the energy concentration produced by “s,” “sh,” and “t” consonants, typically peaking between 5 and 9 kHz. Compression in that band cannot be performed with a broadband compressor without dulling the entire voice on every sibilant moment. A de-esser is a frequency-selective compressor: it triggers only when energy in the chosen band crosses a threshold and attenuates only that band.

Find the offending frequency by sweeping a narrow EQ boost between 5 and 9 kHz until the sibilance becomes painful, then set the de-esser at that centre. Threshold should engage 3–6 dB of attenuation on the harshest “s” only. Heavier de-essing introduces a lisp.

8. Limiter: the last line of defense

A limiter is a compressor with a ratio of approximately infinity-to-one and very fast attack. Its job is to ensure no sample exceeds the ceiling regardless of what reaches it. For podcast delivery, a brick-wall limiter at −1 dBTP true peak with 1–2 dB of catching gain reduction prevents intersample clipping on lossy encoders.

Loudness compliance is a two-step process. First, gain-stage the limiter input so that the integrated LUFS reading sits at the target (e.g. −16). Second, the limiter catches the final 1–2 dB of true-peak excursion. Do not push the limiter to do more than that; if you need 6 dB of limiting to hit target loudness, your compression earlier in the chain is undersized.

9. Multi-host recording: double-enders, sample lock, drift

Remote interviews are typically recorded as double-enders: each host records their own microphone locally to disk while talking to the others over a real-time call (Zoom, Riverside, SquadCast). Each local file is uploaded to the editor, who aligns them on a multitrack timeline.

Two technical pitfalls. Sample-rate mismatch: if one host records at 44.1 kHz and another at 48 kHz, alignment drifts at 8.8 percent, audibly out of sync within ten seconds. Lock all hosts to the same rate. Clock drift: even at the same sample rate, two laptop crystal oscillators drift slightly — perhaps 50 ms over an hour. A waveform-aligning tool (Riverside’s magic edit, Descript, or manual alignment with a hand-clap reference at the start of every session) corrects this.

SUMMARY

Spoken-word audio is solved engineering. Use a dynamic microphone for any room you have not paid an acoustician to fix. Pass it through gate, high-pass, compressor, de-esser, tone EQ, limiter in that order. Target −16 LUFS integrated and −1 dBTP for podcasts. Lock sample rates across hosts. The problems amateurs have are almost always violations of one of those rules — not microphone choice.

NEXT IN THE STUDIO GUIDES

Guide 03 → Electronic Music Production — DAWs, latency, MIDI 2.0, monitor placement geometry.

EQUIPMENT THAT MEETS THE CRITERIA · PODCAST & BROADCAST

Models below are grouped by the physical criterion they satisfy. We list the spec source (manufacturer datasheet) and a link to an independent reviewer (Sound on Sound) so you can verify our reading against working engineers. We did not personally A/B test these models.

Criterion: End-address dynamic with built-in pop / wind protection, low handling noise

For long-form spoken word in a room you cannot acoustically treat (a closet, a corner, a hotel room). Dynamic capsule rejects HVAC and computer fan noise by ~10 dB vs. a condenser at the same gain.

Criterion: USB+XLR hybrid, plug-and-play for solo hosts who may later add an interface

Lets you start with a USB cable into a laptop, then later move to a multi-mic XLR rig without replacing the microphone. Sample rate at the microphone is not the limiting factor for spoken word.

Criterion: Inline gain booster, when the dynamic + interface combination cannot reach broadcast level

+25 dB of clean gain ahead of the interface preamp. Particularly relevant when running an SM7B on an interface published below 60 dB of gain.

Criterion: Multi-input USB interface for two-host or three-host XLR setups

Two front-panel mic inputs are not enough for three-host shows; the interface below ships with four mic preamps and discrete monitor outputs.

About this section. Cloud Atelier participates in the Amazon Associates Program and the Reverb affiliate program. We earn a commission if you purchase through these links, at no extra cost to you. We have not personally tested every product listed. Models appear because their published manufacturer specification satisfies a criterion stated above. Specifications are drawn from current manufacturer datasheets and cross-checked against independent industry reviewers (primarily Sound on Sound). Affiliate relationships do not influence which models qualify for a given criterion. If a spec is wrong or out of date, please tell us.