If you are asking how long it takes to make an audiobook, the honest answer is that it depends far more on scheduling than on writing. The manuscript is already done. What stretches the timeline is everything that happens after: finding a narrator, booking studio time, recording at roughly real-world speed, editing, proofing, and then waiting in a retail review queue. For a full-length novel produced the traditional way, that chain usually runs several weeks to a few months.
This guide breaks the timeline down stage by stage so you can estimate your own book, then shows where full-cast production removes the slowest steps entirely.
The short answer
A finished hour of audiobook is about 9,300 words — the standard conversion rate used across the industry and by ACX, based on a natural reading pace of roughly 150–160 words per minute. So a 60,000-word novel is around 6.5 finished hours of listening.
Producing those 6.5 hours the traditional way takes real calendar time:
| Stage | Traditional human narration | Cinematic full-cast production (Midsummerr) |
|---|---|---|
| Casting / auditions | Days to weeks | Built in — choose from the cast library |
| Recording | ~2 studio hours per finished hour | Generated from the manuscript in a few hours |
| Editing, proofing, mastering | Brings total work to ~6 hours per finished hour | Tweaking in the timeline editor — as deep as you want |
| Agreement → finished audio | 3–6 weeks (often longer) | Days |
| Retail review (platform-dependent) | ~2–3 weeks | Same platform step applies |
The writing is finished before any of this starts. The time you are really asking about is production time — and that is the part the format you choose can change dramatically.
Ready to try it yourself?
Create your first audiobook free →One narrator, full cast, or cinematic?
Timelines depend on what you are producing, so it helps to name the three formats. A single-narrator audiobook is one voice reading everything — the traditional default, and what the human timeline below describes. A full-cast production gives each character a distinct voice. A cinematic (dramatized) adaptation goes a step further: full cast plus music and sound design, produced like an audio drama rather than a read-through.
On the traditional route, each step up that ladder multiplies the schedule — more actors to cast, more sessions to book, more mixing. That is why dramatized audiobooks have historically been reserved for flagship titles. (Full cast vs single narrator covers when the format is worth it.) Midsummerr produces the cinematic format directly from the manuscript, which is why its timeline does not grow the way a studio production's does.
Where the time actually goes in traditional production
The reason a done manuscript still takes weeks to reach listeners is that each stage is gated by human availability and near-real-time recording.
Casting and auditions. Before a word is recorded, you have to find the right voice. On a marketplace like ACX — Audible's production and distribution platform — you post the title, wait for auditions, review samples, and negotiate terms. This alone can take days to weeks, and longer for a book that wants a specific accent, age, or dual-narrator pairing.
Recording. This is the hard floor of the traditional timeline. An experienced narrator records at roughly two studio hours for every finished hour of audio — a 10-hour book is about 20 hours in the booth, and that is before mistakes, retakes, and breaks. Newer narrators can take closer to five hours per finished hour. Recording cannot be sped up much, because the audio plays back in real time.
Editing, proofing, and mastering. Raw narration is not a finished audiobook. Once you add cleanup, proofing against the text (about 1.5 hours per finished hour on its own), pickups, and mastering, total production work reaches roughly six hours per finished hour for an experienced team. For our 6.5-hour novel, that is around 40 hours of skilled work spread across a schedule.
Retail review. After the audio is finished, most retail paths add a review queue. ACX, for example, runs a quality review of about 14 business days, after which a passing title is posted for sale within roughly 10 business days (Apple Books can need a couple more). That is two to three weeks after your audio is already done.
Add it up and the common range from "yes, let's make the audiobook" to "it's on sale" is a few weeks at the absolute fastest and, realistically, one to three months.
A worked example: a 60,000-word novel
Say you have finished a 60,000-word novel and want it in audio.
- Length: ~6.5 finished hours (60,000 ÷ 9,300).
- Recording: ~13 studio hours for an experienced narrator, spread across booked sessions.
- Full production work: ~40 hours including editing, proofing, and mastering.
- Calendar time: 3–6 weeks of production after casting, then ~2–3 weeks of retail review.
None of those hours are your writing. They are the cost of turning a finished book into finished audio one recorded minute at a time. That is the bottleneck full-cast production is built to remove.
How full-cast production compresses the timeline
Full-cast production changes the timeline because it does not record audio in real time. Instead of booking a narrator and capturing 6.5 hours minute by minute, Midsummerr works from the manuscript directly: it identifies characters, assigns distinct voices from a cast library, and generates the narration, music, and sound design as a produced piece rather than a single read-through.
That collapses the two slowest stages at once. There is no casting queue, because you pick voices from the library. There is no recording floor, because the audio is generated rather than performed live. What remains is the part that genuinely deserves human attention — review. You listen back in the timeline editor, adjust pacing, fix a pronunciation, re-cast a character, or tighten a scene, and the changes apply without re-booking anyone.
The practical numbers: generating the cinematic version of a full novel — cast, music, and sound design — takes a few hours, not weeks. The rest of the timeline is yours: tweaking the result in the timeline editor, which takes as long as you want to be meticulous. Some authors ship after a quick review pass; others fine-tune every scene's pacing and pauses, swap a casting choice, or fix a pronunciation. Bottom line: a fully dramatized audiobook in days, with the calendar time spent on your own polish rather than someone else's schedule.
Cost tracks the same shift. Our 60,000-word novel runs about $300 on Self-Serve at $5 per 1,000 words, versus the multi-thousand-dollar range typical of human production. If you want the full breakdown, see what an audiobook actually costs.
You can hear what a produced, full-cast result sounds like on any of the listening samples — the same output the timeline above produces in days.
Does faster mean lower quality?
Speed is only worth it if the result holds up, so it is a fair question. The time you save in full-cast production comes from removing scheduling and real-time recording — not from skipping the craft decisions that make an audiobook good.
A skilled human narrator brings interpretive depth, and for some titles a single trusted voice is exactly right. Full-cast production makes a different trade: distinct voices per character, integrated music and sound design, and unlimited editing, produced without the studio schedule. The full production process is the same set of decisions — casting, direction, mixing — compressed into a day of reviewing instead of weeks of booking.
FAQ
How long does it take to make an audiobook? A traditional human-narrated audiobook typically takes several weeks to a few months from decision to on-sale: casting and auditions, then production at about six hours of work per finished hour, then a retail review of roughly two to three weeks. Full-cast production compresses that to days by generating the narration instead of recording it live.
How many hours of audio is a typical novel? About 9,300 words equals one finished hour, so a 60,000-word novel is roughly 6.5 hours, an 80,000-word book is about 8.6 hours, and a 90,000-word book is about 10 hours.
Why does recording take so long? Audio plays back in real time, so it cannot be recorded faster than it is spoken. An experienced narrator needs about two studio hours per finished hour, and total production work reaches about six hours per finished hour once editing, proofing, and mastering are included.
How long is the retail review after the audio is finished? It depends on the platform. ACX, for example, runs a quality review of about 14 business days, then posts a passing title for sale within roughly 10 business days. That review step applies to whichever platform you distribute through, on top of production time.
Can Midsummerr produce a full audiobook in a day? Generation itself takes a few hours, even for a full-length novel — and that includes the full cast, music, and sound design. The remaining time is tweaking the result, which depends on how meticulous you want to be. In practice, a fully dramatized audiobook is a matter of days.
Takeaways
The manuscript is not the bottleneck — scheduling and real-time recording are. Traditional production spends roughly six hours of skilled work per finished hour and adds weeks of retail review, which is why a done book still takes weeks to months to reach listeners. Cinematic full-cast production generates the whole performance in hours and leaves the tweaking to you — so a fully dramatized audiobook ships in days.
Ready to see your own timeline? Check what your book would cost and how fast it can be produced.




