Voice casting makes or breaks an audiobook. A character's voice isn't decoration — it's identity. The moment a listener hears your protagonist speak, they form a complete impression: age, class, background, temperament, even trustworthiness. Get the voice right, and the listener disappears into the story. Get it wrong, and they're constantly pulled out, struggling to reconcile the voice with the character on the page.
In traditional audiobooks with a single narrator, this problem is simplified away — one voice handles everything, and the listener adjusts. In full cast audiobooks, where each character has a distinct voice, the stakes are higher. A miscast voice doesn't just break immersion. It actively undermines character development and relationship dynamics that depend on vocal contrast.
This is the work of voice casting. And unlike traditional production — where you're paying thousands for a casting director, audition sessions, and studio time — you now have tools that let you make these decisions directly, affordably, and with unlimited revisions.
Here's how to choose the right voice for every character in your story.
Why Voice Matters More Than You Think
Before diving into the mechanics, it's worth understanding why voice casting has such outsized impact.
The voice-character bond is instant
Readers form an internal voice for each character as they read. That voice is constructed from scattered clues: dialogue style, word choice, how other characters react to them, their backstory. It's personal — shaped by the reader's own accent patterns and associations.
When that book becomes an audiobook, the narration collapses that ambiguity into a single, concrete voice. The listener is no longer constructing — they're receiving. And in the first few seconds of a character's speech, they're making judgments about who that character is.
These judgments are automatic and deeply rooted. Voice carries information about:
- Age and generation. A gravelly, creaky voice reads as elderly. A high, bright voice reads as young or youthful even if the character is canonically middle-aged. Pitch, timbre, and delivery speed all signal age.
- Social position and class. Accent, formality of speech, pace, and vocabulary are real class markers. A character with a clipped, precise voice and careful diction reads as educated or upper-class. A character with a relaxed, colloquial delivery reads as working-class or casual. Stereotyping? Yes. Universal? Largely, across English-speaking audiences.
- Emotional baseline. A warm, smooth voice signals trustworthiness and safety. A tight, controlled voice signals repression or tension. A loose, rambling voice signals instability or enthusiasm.
- Temperament and personality. A confident character needs a voice that carries authority. A nervous character needs vocal hesitation. A charming character needs vocal warmth. These aren't subtle effects — they're primary sources of characterization in audio.
Miscast voices create cognitive dissonance
When a character's voice doesn't match their function in the story, listeners experience dissonance. A villain with a kind, warm voice is unsettling in the wrong way. A powerful protagonist with a thin, reedy voice creates doubt about whether they can actually carry the story.
The listener doesn't consciously think "that voice is wrong." They just feel something is off. And that feeling accumulates across chapters. By the time the character reaches their emotional climax, if the voice has never felt right, the scene lands with half impact.
Ready to try it yourself?
Create your first audiobook free →The Voice Casting Framework
Successful voice casting starts with understanding your character's function in the story, then matching voice qualities to that function.
The five dimensions of character voice
Every voice carries information across five primary dimensions:
1. Age and vocal maturity
- Young (teen/20s): Lighter, higher pitch, faster pace, less gravel or texture
- Early middle-aged (30s-40s): Fuller, warmer, more grounded
- Mature (50s-60s): Lower pitch, possible gravel or rasp, slower pace
- Elderly (70+): Possible tremor, creak, or reduced projection. Pitch shifts vary by gender — women's voices tend to lower with age while men's may rise slightly.
The temptation is to match vocal age to character age exactly. Resist it. A 45-year-old character can be voiced by someone younger if the delivery conveys experience. A 70-year-old can be voiced younger if the character has exceptional energy or vitality. Focus on the character's internal age — how old they feel and act — not their canonical years.
2. Warmth and approachability
- Warm/approachable: Smooth tone, relaxed pace, open vowels, inviting presence
- Neutral: Professional, clear, neither warm nor cold
- Guarded/cold: Tighter tone, clipped delivery, careful diction, reserved presence
Warmth is about whether the listener instinctively trusts or likes the character. Allies and sympathetic characters benefit from warmth. Antagonists benefit from coldness. Morally gray characters benefit from neutrality or a mix.
3. Energy and vitality
- High energy: Fast pace, active delivery, sharp articulation, vocal enthusiasm
- Grounded energy: Steady pace, confident delivery, present awareness
- Low energy: Slow pace, measured delivery, possible flatness or fatigue
Energy should reflect the character's role and agency. Active protagonists need vocal vitality. Passive characters or those in crisis might benefit from lower energy. Comedic relief characters typically have high energy.
4. Accent and regional identity
Accent serves two functions: it signals geographic or cultural origin, and it creates vocal differentiation. In a full cast audiobook, every character needs to sound distinct. Accents are one tool for that differentiation.
Critical consideration: Accents are powerful but risky. An accent can either authenticate a character or caricature them.
- Use accents for characters where it's integral to their identity or backstory.
- Avoid accents unless you have a voice actor who can execute them authentically.
- Never use accents as a shortcut to characterization. A character from the South doesn't need a Southern accent to be Southern — their word choice and pacing can do that work.
In AI-generated voice casting, accent choices are typically pre-built into voice models. Understand what accents are available, and use them purposefully.
5. Vocal texture and timbre
This is the hardest dimension to articulate, but it's crucial. Vocal texture is the color of a voice — the quality that makes one voice feel rough, another smooth, another bright, another dark.
- Smooth, warm timbre: Associated with comfort, reliability, sexuality, trustworthiness
- Rough, gravelly timbre: Associated with age, experience, danger, intensity
- Bright, light timbre: Associated with youth, optimism, superficiality, humor
- Dark, heavy timbre: Associated with authority, weight, seriousness, potential menace
Texture should reinforce character function. A hero might have bright or warm timbre. A villain might have rough or dark timbre. A love interest might have warm or bright timbre. None of these are rules — they're associations the listener will unconsciously register.
Matching Voices to Character Types
Once you understand the five dimensions, you can apply them to character archetypes. Most characters in fiction fall into recognizable roles:
The Protagonist (POV Character)
This is the character the listener spends the most time with. Their voice needs to be approachable and clear, so the listener doesn't get fatigued listening to 40+ hours of their interiority. Vocal warmth is typically beneficial — it makes the protagonist likable and encourages identification.
- Age: Match their chronological age or internal age. A 25-year-old protagonist with high energy; a 45-year-old protagonist with grounded energy; a 70-year-old protagonist with wisdom and steadiness.
- Warmth: Warm or warm-neutral. Listeners need to spend time in this character's head.
- Energy: Should reflect their agency in the plot. Active protagonists = high or grounded energy. Passive or reactive protagonists = grounded or lower energy.
- Accent: Only if it's central to their identity. Most protagonists benefit from neutral or lightly regional accents.
- Texture: Clear and present. Avoid overly rough or unusual textures that might become fatiguing over 40 hours.
Example: In Ninth House by Leigh Bardugo, Alex Stern is a 20-something survivor with sharp intelligence and dry humor. Her voice should be young-adult, clear, with enough edge to convey her cynicism. Warm enough that we like her despite her defenses. Not so rough that the reader-listener gets fatigued.
The Romantic Lead (Love Interest / Secondary Protagonist)
This character's voice needs vocal contrast with the protagonist, but also vocal chemistry. The listener should be able to hear the attraction or tension between them.
- Age: Typically peer or slightly different, depending on story. Match their chronological age or the age they read as to the protagonist's vocal age.
- Warmth: Should have contrast. If protagonist is warm, love interest might be cooler or more guarded (tension). If protagonist is neutral or guarded, love interest might be warmer (attraction).
- Energy: Should complement protagonist. If protagonist has high energy, love interest might be grounded. If protagonist is reactive, love interest might be active. Variation creates dynamic dialogue.
- Accent: Can be used for differentiation if appropriate to character. A partner from a different region or background can have a distinct accent.
- Texture: Should be distinctly different from the protagonist. If protagonist is bright, love interest might be warm or dark. If protagonist is smooth, love interest might have texture or grain.
Example: In Outlander, Claire and Jamie need voices with vocal chemistry. Claire's voice — intelligent, Scottish (in time), warm — needs to play against Jamie's voice in a way that creates tension and connection. The voices should feel like they belong in scenes together.
The Antagonist (Villain / Obstacle)
The antagonist's voice should create the right kind of discomfort. Not cartoonish evil — actual threat.
- Age: Match their role. A young antagonist might feel like an upstart threat. An older antagonist might feel like entrenched power. No rules, but consider how age affects the listener's perception of their menace.
- Warmth: Cold or neutral is typical. A warm, friendly antagonist can be more unsettling than an obviously evil one. The warmth makes them seductive.
- Energy: Should reflect their threat. A passive antagonist (threat through inaction or obstruction) might have lower energy. An active antagonist (direct threat) needs vocal presence.
- Accent: Can be used for differentiation, but avoid accents that feel like coding. An antagonist's menace should come from their character and actions, not their accent.
- Texture: Rough, dark, or unusual textures can reinforce menace. Avoid anything that sounds like caricature.
Example: In A Court of Thorns and Roses, Rhysand is charismatic and dangerous. His voice needs to be compelling and magnetic — listeners should understand his pull — but with an undertone of danger or coldness that hints at his capacity for violence. A voice that's warm but not kind.
The Ally or Sidekick
This character creates vocal variety and often provides comic relief or emotional counterbalance to the protagonist.
- Age: Can be any age relative to protagonist. Age contrast often creates dynamic interplay.
- Warmth: Can vary. Loyal allies often have warmth. Gruff allies can be cooler.
- Energy: Often higher energy than the protagonist (contrast and momentum). Can be comedic.
- Accent: Great candidate for accent variation. A sidekick from a different background can have a distinct accent that reinforces their outsider status or cultural identity.
- Texture: Should be distinctly different from protagonist to create aural variety.
Example: In The Poppy War, Rin's best friend Kitay should have a voice distinct from Rin's. If Rin is intense and high-energy, Kitay might be warmer and steadier — a grounding presence. Or if Rin is sharp and controlled, Kitay might be more open and expressive.
The Ensemble (Multiple Characters with Equal Weight)
In ensemble casts (think The Seven Husbands of Evelyn Hugo or The Thursday Murder Club), every character needs distinct vocal identity because no single narrator carries the load.
- Age: Vary across the cast to create natural vocal differentiation.
- Warmth: Distribute warmth and coolness to reflect relationships and character function.
- Energy: Vary widely. An ensemble needs vocal rhythm and contrast.
- Accent: Can be used more liberally in ensemble. Different voices mean different backgrounds, and accents can authentically signal that.
- Texture: Maximize contrast. If one character is smooth and warm, give another character rough and dark. If one is bright and high-pitched, give another deep and grounded.
Example: In The Seven Husbands, each husband and Evelyn herself need distinct vocal presence. Evelyn might be bright and commanding. Her first husband might be warm and earnest. Her second might be cool and intellectual. The third dangerous and magnetic. By the time you get through seven, each voice should be unmistakably distinct.
How to Choose Voices in Practice
Now the mechanics. Whether you're working with a director or selecting voices yourself in a tool like Midsummerr, here's how to make smart choices.
Method 1: Start with contrast
Don't cast voices in a vacuum. Cast voices in relationship to each other.
- Identify your protagonist. This is your anchor.
- Choose the protagonist's voice first. It should be clear, approachable, and age-appropriate.
- For every other character, ask: What contrast does this character need against the protagonist and other characters?
- Build out from there.
Example: If your protagonist is a young, bright, warm voice, your antagonist should be darker, cooler, or deeper. Your love interest should be warm or cool depending on the dynamic you want. Your sidekick should be energetic and distinct.
Method 2: The character arc test
Listen to how a character sounds at the beginning of the story, then at the end. Does the voice still fit?
A character who undergoes transformation might benefit from a voice that can shift — warm to cold, high-energy to grounded. If you're locking in a voice, make sure it can accommodate the character's journey.
Example: In a corruption arc, a protagonist starts warm and trustworthy and becomes cold and menacing. Can their voice convey both? Or do you need a different approach?
Method 3: The dialogue scene test
Take a key dialogue scene between two characters and listen to them interact. Do their voices create chemistry, tension, or contrast as intended?
- Does the protagonist's voice feel right in conversation with the love interest?
- Does the antagonist sound threatening when confronting the protagonist?
- Do ensemble members sound like distinct people or like variations on a theme?
If a voice doesn't work in key scenes, reconsider it. The scene-level test catches casting mistakes that character-level analysis might miss.
Method 4: The fatigue test
If your character speaks for an extended passage — especially if it's a scene of interiority or monologue — can you listen for 5-10 minutes without getting fatigued?
Some voices, no matter how well-cast thematically, are tiring to listen to at length. A voice that sounds great for a 2-minute confrontation scene might become grating across a 10-page internal monologue.
If you're in the revisions stage (and with modern tools, you typically are), try alternative voices for characters with heavy speaking parts. Fatigue is a legitimate reason to recast.
Common Voice Casting Mistakes
Mistake 1: Over-casting age
A character who is 35 gets a voice that sounds 55 because "authority." Now they read as older and weaker, not more authoritative. Instead, cast them age-appropriate and let their dialogue and actions convey authority.
Mistake 2: Cold equals interesting
Antagonists don't need to sound evil. In fact, a warm, charismatic antagonist is far more compelling and dangerous than an obviously cold one. Don't assume cold = interesting. Complexity is interesting.
Mistake 3: Accents for character
A character from Ireland doesn't need an Irish accent to be Irish. A Southern character doesn't need a Southern accent to be Southern. Accent is geography; character is action and speech. Unless accent is integral to their identity, focus on character first.
Mistake 4: Accent as shortcut
Conversely, don't use accent as the only differentiation for a character. An accent helps, but a character needs actual vocal personality — warmth, energy, texture — not just an accent.
Mistake 5: One voice for everything
Some characters speak in multiple contexts: as a public figure and as a private person, or across emotional states. A single voice that fits them in one context might feel wrong in another. This is where you need either a voice that's versatile enough to shift, or acceptance that they're cast in one mode.
Mistake 6: Forgetting the listener
Cast for the listener, not for you. Your internal voice for a character might be completely different from what works in audio. Trust the fatigue test and the dialogue scene test. If listeners are getting something else, that's the real feedback.
Self-Serve vs. Director-Led: When to Get Help
If you're using Midsummerr's Self-Serve tier, you're making voice choices yourself with unlimited revisions. That's powerful, but it also means you're the one responsible for casting.
When might you want a Director-Led production instead?
- You're unsure about your choices. A director brings professional casting experience. They can listen to your character descriptions and suggest voices with the experience of thousands of audiobooks behind them.
- Your book has a large ensemble. Ensuring 10+ characters all have distinct, appropriate voices is a lift. A director manages that complexity.
- You want consistent quality across a series. If you're producing multiple books in a series, a director ensures voice continuity and character consistency across installments.
- You want expert guidance on accent and inclusion. A director can help navigate accent choices thoughtfully and ensure your casting reflects your story's diversity.
In Self-Serve mode, you get unlimited revisions, so you can iterate toward the right choices. In Director-Led, you get expert guidance from the start, which often means fewer rounds of revision.
Either way, you're not locked in. The goal is a cast that serves your story.
Testing and Iteration
Modern AI audiobook production gives you something traditional production never did: unlimited revision without re-booking sessions or paying per-revision fees.
Take advantage of it.
- Listen to chapter one. Get your main characters in dialogue. Do their voices work together?
- Listen to a key scene. An emotional moment, a confrontation, a moment that defines a character. Do the voices serve that moment?
- Listen to variety. Check how characters sound in different emotional contexts. Does the protagonist's voice hold up during crisis? Does the antagonist sound appropriately menacing in confrontation?
- Iterate. If something doesn't land, recast. You're not paying for it; you're just refining.
- Get fresh ears. If possible, have someone else listen to a scene. Their reaction to a miscast voice will be immediate and honest in a way your own perception might not be.
Voice Casting as Craft
Voice casting in audiobooks is a craft. It's not science, and it's not magic. It's the work of matching vocal qualities to character function in service of the story.
The difference between a good audiobook and a great one is often voice casting. The same text, cast differently, lands differently. A protagonist with the wrong voice becomes harder to root for. An antagonist with the wrong voice becomes less threatening. A love interest with the wrong voice loses romantic chemistry.
Get the voices right, and the listener disappears into the story. They stop thinking about production and just listen. Everything else — the pacing, the music, the story itself — serves the characters.
That's the goal. And it starts with voice casting.
