Hearing is Believing: Audiobook Narration, AI, and the Art of Translation

Madeleine Fisher

PROFILES - Sonic Identity

Read

Conor Kenahan

PROFILES - Sonic Identity

11.7.2023

Read

Taking A Moment To Listen Helped The Josh Craig Make The Right Decision

COMMUNITY - Wish You Were Here

11.2.2023

Read

Musical Pedagogy: Musical Knowledge Production Across The Centuries

MUSIC

10.19.2023

Read

Songs That Melt, Flow, and Freeze Into Shapes: Karen Juhl on SILVER

SCIENCE+TECH - Synesthesia

9.26.2023

Read

This Clearance Bin Find Hooked Paul Maxwell On Music Making

PROFILES - Sound Catalyst

9.19.2023

Read

Laura Brunisholz's New York in Grey

SCIENCE+TECH - Synesthesia

9.12.2023

Read

Pouch Envy Took Tracking Down This Jungle Record Into His Own Hands

MUSIC - Favorite White Label

9.5.2023

Read

Magic and Pasta: DJ Tennis on Cooking & DJing

PROFILES

6.22.2023

Read

Mel Hines Isn't Afraid To Try New Things

COMMUNITY - Water Cooler

6.15.2023

Read

Exploration & Pursuit: Parallel Creative Processes in Music and Science

SCIENCE+TECH

6.8.2023

Read

What Does Death Sound Like? How to Listen at the End.

SCIENCE+TECH

5.25.2023

Read

Scott Lazer Believes The Best Ideas Are Right In Front Of You

COMMUNITY - Water Cooler

5.18.2023

Read

Exploring Animal Vocalizations & Communication: Moos & Oinks Have Meaning & Birds Are Karaoke Champs

SCIENCE+TECH

5.17.2023

Read

We're Hearing Flowing Melancholy In This Photo by Eponine Huang

HII FREQUENCY - Call-N-Response

5.10.2023

Read

Sex, Candy, and Sage Green

SCIENCE+TECH - Synesthesia

5.5.2023

Read

A Decade Later, Jacob Gambino Can't Stop Listening to Kowton's 'F U All The Time'

MUSIC - Favorite White Label

4.25.2023

Read

CALL FOR PITCHES: Issue 3 "PUNK IN THE POST-APOCALYPSE"

HII FREQUENCY

3.14.2023

Read

Hi-Tech Therapy: AI's Arrival In Sound Wellness

SCIENCE+TECH

3.2.2023

Read

Sounds of the Peruvian Andes: A Musical Cosmology (ft. Tito la Rosa)

SOUNDNESS

1.10.2023

Read

Jesiah Atkinson

PROFILES - Sonic Identity

11.23.2022

Read

Michael Lovett (NZCA LINES)

COMMUNITY - Wish You Were Here

11.11.2022

Read

Beneficios de Hablar en Voz Alta

SOUNDNESS - Translations

11.9.2022

Read

Parenting & Surveillance

CULTURE

11.4.2022

Read

Sarah Weck

PROFILES - Sonic Identity

11.1.2022

Read

MAY I TOUCH YOU?

HII FREQUENCY

10.28.2022

Read

Experiencing the Unseen: Tangible Impacts of Infrasound and Ultrasound

SOUNDNESS

10.24.2022

Read

AI Music Optimism in the Face of Dystopia

MUSIC

10.14.2022

Read

Call-N-Response: 8-Ball Community

HII FREQUENCY - Call-N-Response

10.11.2022

Read

29 Speedway and Laser Days @ Pageant

COMMUNITY - Wish You Were Here

10.7.2022

Read

Food Sounds

HII FREQUENCY - We Love

10.5.2022

Read

Breathing, Laughing, Snoring: Your Personality Sounds

SOUNDNESS

9.26.2022

Read

Sleep Trackers: The Unsound Recording Devices Disrupting Our Sound Sleep

SCIENCE+TECH

9.23.2022

Read

Respirar, Reir, Roncar: Soundtrack Personal

SOUNDNESS - Translations

9.20.2022

Read

Crystal Guardian 'Savory Silence' Interview

PROFILES - Hii Interviews

9.19.2022

Read

Noise as the Enemy: Anti-Noise Efforts in the Early 20th Century

CULTURE

9.16.2022

Read

Tone Deafness & Melody

SCIENCE+TECH - Phenomena

9.13.2022

Read

O Som Dos Bailes: Brazil’s ‘Cook Out Music’

MUSIC

9.9.2022

Read

Then Who Was Phone? Phones In Horror

FILM + TV

9.7.2022

Read

Jaimie Branch: A Life in Sonic Communication

COMMUNITY

9.2.2022

Read

Audio As Evidence: The January 6 Hearings and Watergate

CULTURE

8.31.2022

Read

Amy Claire (Caring Whispers ASMR)

PROFILES - Hii Interviews

8.24.2022

Read

Nyshka Chandran

PROFILES - Sonic Identity

8.19.2022

Read

The Language of Music

MUSIC

8.16.2022

Read

Loading ...

Hearing is Believing: Audiobook Narration, AI, and the Art of Translation

Listen

Grace Ebert ↗

- Contributor

SCIENCE+TECH

1.21.2022

Grace Ebert examines the growing use of AI technology in audiobook generation. While cost-effective, can it replicate the human experience of speech? It seems to be getting closer to the “real thing”

Note: This narration is read by a DeepZen narrator.

“Hi, I’m Kellie. I’m a DeepZen narrator. You may think that I’m a human, but my voice is generated by DeepZen technology. Our tech allows me to showcase seven different emotions.”

Source

This friendly voice rings from a small play button on the homepage of the AI company DeepZen, and it does, in fact, uncannily resemble that of a voiceover artist’s live recording. The clip proceeds with Kellie performing her septet of impressions, allowing her voice to rise to an exuberant pitch as she declares her happiness before lowering to convey the irritation she’s conferring from the text. Her synthetic interpretation is undoubtedly surreal the longer you listen, and this isn’t surprising considering the voice is a clone produced by replicating recordings of a person speaking.

Bots like Kellie are some of the newest narrators venturing into the publishing landscape, and if the emotional sampling on the site holds up, her interpretation of a text, either fiction or non, wouldn’t necessarily elicit questions from listeners about the nature of the voice emanating from their Airpods.

A long way from their cassette-tape predecessors, audiobooks have seen rapid growth in recent years and are part of a market projected to reach a value of $15 billion by 2027. They certainly buoyed the publishing industry at the onset of the COVID-19 pandemic when shipping delays were indeterminate and now, too, when we’re experiencing supply chain backlogs. And beyond audiobooks’ value for publishers and consumers, the medium also offers greater accessibility for people who are unable to read from a page, a concept that myriad newspapers and magazines like The New York Times, The Washington Post, and this publication have embraced by offering audio versions of their content, as well.

Source

Historically, though, these forms have been costly to produce, entirely because they’re a deeply human endeavor. Practiced voiceover artists and actors spend hours recording in the studio followed by an editing process that ends up costing publishers an average of $5,000 to $10,000 per title. It’s no wonder that companies are gravitating toward AI models that greatly reduce costs, especially when services easily integrate into established publishing systems. The Washington Post’s process using Amazon Polly, for example, is as follows: “When an article is ready for publication, the written content management system (CMS) publishes the text article and simultaneously sends the text to the audio CMS, where the article text is processed by Amazon Polly to produce an audio recording of the article. The audio is delivered as an mp3 and published in conjunction with the written portion of the article.” Seamless.

Source

There are plenty of reasons to justify the original price tag for audiobooks in particular, one being that the experience of listening to a novel or memoir narrated by a celebrity or the author herself is unparalleled. Hearing Patti Smith’s gritty voice describe drinking black coffee in the same Greenwich Village cafe enhances the overt coolness of M Trainand having Levar Burton narrate Astrophysics for Young People in a Hurry creates a profound mix of nostalgia, comfort, and childlike wonder. When the narrator has the power to augment a work like Smith and Burton, what happens when a bot like Kellie is producing our listening experience?

As I mentioned, it’s not immediately apparent that Kellie’s performance is manufactured, but what narrations like hers might lack for seasoned listeners is the art. Veteran Washington Post reviewer Katherine Powers writes, “Your own imagination and interpretation have more independence when reading a book yourself than when a narrator’s voice controls the text. Audiobooks could be said to be a species of translation: Although true to the words, they are different in character from the original, the printed page.” For Powers, it’s the oral performance of the text, the breathy pauses, and refined intonations, that provide new insight and nuance for the listener. This notion is rooted in our brain’s chemistry as writer Jane Alison explains in her book on the patterns of narrative: “Neural activity registering sound is about the same whether a word is read silently or aloud; a part of the brain called Broca’s area generates the ‘sound’ of the word internally.” In other words, when our brains process language, we both envision an image and “hear” the text as if it were audible, whether it is or not. If you’re reading the printed version of this article, for example, notice how you’ve had an internal dialogue running this entire time, or when you hear the word “orange,” you also visualize it as type. This makes translation an apt comparison, considering our sensory processing is different when we’re scanning a page ourselves versus when a text is filtered through a voice that isn’t produced in our own brains.

Source, ThoughtCo / Gary Ferster.

Valuing this kind of performance or transposition is not new. As Rand Faris writes in a piece on spoken word, “Poetry does not have a particular sound. That’s the beauty of it. Its sound is elastic and very personal to the reader—just as a cover of a song, to the original version.” The same is true, of course, of prose. In the case of audiobooks, a narration diverges from Faris’s elasticity and provides solidity to the words printed on the page, transposing each phrase into a newly formed structure for the listener to nestle into. This is apparent with performances from Smith and Burton, which provide distinct interpretations that certainly determine the emotional impact of the text.

When Kellie is narrating, we’re hearing an algorithmic logic determining cadence, pause, and ultimately, the effect of the work. Her voice is generated using Text to Speech technology to convert the written forms to an audible counterpart and then Natural Language Processing to add emotion and expression. “Human language is separated into fragments so that the grammatical structure of sentences and the meaning of words can be analysed and understood in context,” DeepZen shares. “This helps computers read and understand spoken or written text in the same way as humans.” The structure we’re finding ourselves enmeshed in when listening to audio by Kellie or another bot, although rooted in a real person’s performance, still remains patently inhuman.

It’s understandable that this illusion makes people uncomfortable, especially since there are valid ethical arguments against programming an AI to replicate someone else’s voice—this controversy surrounds the director’s choice to do so in Roadrunner: A Film About Anthony Bourdain. There are also questions about the future of human creativity when bots are able to write an entire article and co-author book reviews, and the latter seems more closely tied to most readers’ rejections of AI-narrated audiobooks, especially when they are (presumably) created with consent and licensing agreements.

Source

Some platforms, like Audible, are reluctant to utilize AI, with its self-publishing branch stating, “Your submitted audiobook must be narrated by a human. TTS recordings are not allowed. Audible listeners choose audiobooks for the performance of the material, as well as the story. To meet that expectation, your audiobook must be recorded by a human.” Because Audible controls a sizable portion of the market, there’s clear resistance to the idea that a bot has the ability to create a worthy translation of a text.

Here, it’s the human investment, the subtle creative interpretations, and the singular voices that make a particular performance meaningful, a long-held understanding in literature. Famed translator Margaret Jull Costa says about evaluating different iterations of the same work, “I usually use the analogy of the many different Hamlets one has seen over the years. They’re all Hamlet, but the best have invested every word with meaning and with their own self and life experience, too, and some you like more than others.” This applies, too, to audiobooks.

As with any other art form, readers will determine through their dollars and attention when an AI narration is acceptable (non-fiction titles and educational texts tend to be an easier sell). Sometimes, we do want the human connection, an impulse that’s proven even by the number of questions on Apple’s site about what to do when Siri’s infamous voice turns “robotic.” If a narration as jagged and mechanical as the reverted Siri is difficult to listen to with answers to questions like “how late is the grocery store open?” or “which theater is a movie playing at?”, then listening to a book-length work that we expect to be lyrical and poetic would be arduous.

Ultimately, though, a wholesale rejection of the technology isn’t helpful either, especially when companies like DeepZen offer convincing alternatives to more rigid narrators of years before. If an AI can help free up publishers’ budgets to make more titles more accessible through audiobooks, whether read by a bot or not, then that’s a worthy goal.

As for the art, we can return here to the enduring questions posed by Walter Benjamin about mechanical reproduction: “Even the most perfect reproduction of a work of art is lacking in one element: its presence in time and space, its unique existence at the place where it happens to be.” No bot that sounds like Smith or Burton will actually be one of those figures, and listening to their voices is often what drives us to those translations in the first place: hearing a work of art, whether their own or another artist’s, interpreted and translated into something new.

Source

Earshot ==== NINA

Listen

Hii Magazine

hii@hii-mag.com

@hii_magazine

Info

Masthead

Subscribe to our email list and get updates on new articles

Subscribe

Madeleine Fisher

PROFILES - Sonic Identity

Read

Conor Kenahan

PROFILES - Sonic Identity

11.7.2023

Read

Taking A Moment To Listen Helped The Josh Craig Make The Right Decision

COMMUNITY - Wish You Were Here

11.2.2023

Read

Musical Pedagogy: Musical Knowledge Production Across The Centuries

MUSIC

10.19.2023

Read

Songs That Melt, Flow, and Freeze Into Shapes: Karen Juhl on SILVER

SCIENCE+TECH - Synesthesia

9.26.2023

Read

This Clearance Bin Find Hooked Paul Maxwell On Music Making

PROFILES - Sound Catalyst

9.19.2023

Read

Laura Brunisholz's New York in Grey

SCIENCE+TECH - Synesthesia

9.12.2023

Read

Pouch Envy Took Tracking Down This Jungle Record Into His Own Hands

MUSIC - Favorite White Label

9.5.2023

Read

Magic and Pasta: DJ Tennis on Cooking & DJing

PROFILES

6.22.2023

Read

Mel Hines Isn't Afraid To Try New Things

COMMUNITY - Water Cooler

6.15.2023

Read

Exploration & Pursuit: Parallel Creative Processes in Music and Science

SCIENCE+TECH

6.8.2023

Read

What Does Death Sound Like? How to Listen at the End.

SCIENCE+TECH

5.25.2023

Read

Scott Lazer Believes The Best Ideas Are Right In Front Of You

COMMUNITY - Water Cooler

5.18.2023

Read

Exploring Animal Vocalizations & Communication: Moos & Oinks Have Meaning & Birds Are Karaoke Champs

SCIENCE+TECH

5.17.2023

Read

We're Hearing Flowing Melancholy In This Photo by Eponine Huang

HII FREQUENCY - Call-N-Response

5.10.2023

Read

Sex, Candy, and Sage Green

SCIENCE+TECH - Synesthesia

5.5.2023

Read

A Decade Later, Jacob Gambino Can't Stop Listening to Kowton's 'F U All The Time'

MUSIC - Favorite White Label

4.25.2023

Read

CALL FOR PITCHES: Issue 3 "PUNK IN THE POST-APOCALYPSE"

HII FREQUENCY

3.14.2023

Read

Hi-Tech Therapy: AI's Arrival In Sound Wellness

SCIENCE+TECH