top of page
  • Writer's pictureTeresa Widdowson

KDP Virtual Voice Audiobook: Part 2

Updated: 7 days ago

Cell phone with earbuds and The RH Factor playing

Hello all! Welcome to Part 2 of my KDP Virtual Voice Audiobook blog. This is a continuation of my last blog about how I created an audiobook version of my books using KDP’s (Kindle Direct Publishing’s) Beta Virtual Voice (or VV, as I will call it from here on out). Read Part 1, here. I’m still in the process of proof-listening to The RH Factor and hope to have the audio version available soon. Now, on to the blogging…

I’m going to get into the nitty gritty of the edits I had to make in the pronunciation, voice speed, and pauses in order to ensure my books sounded as good as they could by using this KDP tool. I won’t list all the words here, but if you’re interested in receiving a list of the phonetic changes I had to make, click HERE. If you’re an author working on creating your own audiobook using this tool (or any other AI virtual voice tool) perhaps my list will save you some time.

I left Part 1 off by stating that to get the best result you can using the VV, you need to proof-listen to EVERY SINGLE WORD. I know. I can hear you groaning from here. But here’s a perfect example of why it’s important not to just do a quickie listen, or spot check just words, paragraphs or chapters in your book where you suspect it may have problems. I had the following sentence: “I read Ms. Neilmann’s email last night.” Easy, right? But instead of using the past tense of read, the VV read it as the present tense. So I had to tell the VV to pronounce it like "red". This happened at least twice in my book. If I hadn’t listened to every single word, I probably would have missed those.

Figuring out how to spell a word phonetically so that the VV would pronounce it correctly was the main challenge. There is no way, at least that I found, to tell the VV which syllable needs stressing. I tried adding accents, as in ‘é’, but that did nothing. It’s all in the spelling. So sometimes I had to get very creative. One of my most difficult words to get it to say correctly was accumbens (as in nucleus accumbens). I ended up having to spell it, “ackyoombinz”. It took quite a few attempts before I came upon that spelling.

By the way, when you’re making a change using the tools on the right, don’t forget to click the Apply icon. Otherwise, your change will not take effect. I can’t tell you how many times I forgot to do that!

A lot of the words the VV had problems with were words the English language has adapted from another language, like the French words latte and macabre. But there were other, seemingly normal words, that also gave it problems. Some words were pronounced too perfectly. The ’T’ at the end of “start” in the word “start-up” was pronounced too distinctly and did not sound natural. I had to change the pronunciation to “stardup”. Another word it struggled with was “gene” which it pronounced as “gen” with a soft ‘G’. That surprised me, and I ended up having to tell the VV to say it like “jein”. Also, for some reason, it sometimes pronounced the title “Dr.” as “dry”, so I had to change it to “doctor” everywhere in the book. 

One of the funnier moments in my proof-listening occurred when the VV tried to say yada, yada, yada. I had to speed the pronunciation up by 50% for it to sound like natural speech. Slang phrases also presented a problem. For the phrase, “Wha’ cha doin’?”, I sped it up by 25% to make it sound more like real-life conversation.

Short exclamatory or pause words, like “oh”, “whew”, or “uh”, often did not sound natural. I had to slow down the speed of pronunciation and/or change the pronunciation phonetically for them to sound more like what I wanted. Even then, it rarely got the emotion I wanted behind the word. 

Because you can’t shorten an existing pause, this sometimes meant the reading sounded unnatural. This was especially obvious in conversations where someone interrupted the other speaker. It didn’t sound like an interruption because the pause between the sentences was too long.

In the formatting of the Kindle and paperback versions of my books, the first three words in every chapter are in all caps. For instance: “IT WAS ALMOST night when she walked into the room and saw him.” This capitalization caused issues because sometimes the VV read some of those words as acronyms. The VV pronounced “IT” as “I-T”. Maybe it thought I meant Information Technology? So I had to correct it to be simply “it”. Thankfully, there is an option, when you change the pronunciation of a word, to apply your change to the word everywhere it appears in the book. I did notice, however, that it didn’t always give me that choice.

The opposite was also true. Sometimes acronyms were pronounced as words. For example, I used the acronym “ME” (for Medical Examiner), but the VV pronounced it as simply “me”. I thought it was strange that it didn’t recognize it as an acronym, but I corrected it by providing the phonetic spelling of em-ee”. I could have also simply written it as "m e". The acronym “IV” was another problem and showed up in my book, MYND Control a lot. The VV wanted to say it like “ivy” (with the accent on the first syllable). I ended up using the phonetic spelling “eye-Vée”. In that case, the accent on the ‘E’, and the capitalization of the letter V seemed to help. It still wasn’t perfect, but it was as close as I could get.

Other clarifications had to do with street and highway numbers. We usually pronounce 208th Street, two-hundred and eighth street, but the VV said two-hundred eighth street. It wasn’t really wrong, but it wasn’t the way I wanted it to be read. Highway numbers presented the same problem. Highway 520 became highway five-hundred and twenty. So I had to watch out for those spots. In general, numbers were problematic. So I had to pay attention to measurements, dates, times, heights, etc.

The VV has some inflection, but there's often not enough emotion in the dramatic scenes. I had a fight scene, but the VV didn’t reflect the heightened emotions enough for the situation. Even when it enunciated the exclamation points, it still wasn't enough to get across the true feeling of the scene. There wasn't much I could do about that. But I did have some luck every now and then. For example, I had a sentence “I wanted it to be better than the real thing.” I wanted the VV to emphasize the word “better”. So, I slowed down the pronunciation of only that word by 25% and got more of what I wanted.

When conversations go on for some length, it’s sometimes hard to tell who’s talking. In print, it’s not good to use too many “he saids”, “she saids”. It breaks up the flow of the conversation. But in an audio version, at least using this VV, the voices of all the characters are the same, making it difficult at times, to tell who’s talking. When I found that happening, I simply changed the pronunciation of a word to clarify who was speaking. Maybe the sentence was, “That’s not what I was going to do!” I changed the pronunciation of the first word, “That”, to “Vega said, that”. Problem solved. Now the listener knows Vega is the one speaking, but I didn’t have to change the wording in my written version.

This works for errors as well. If I find an error in my written text, I can fix it audibly. Let’s say I used the past tense instead of the future, or I left out a word, or added an extra word. I can fix those errors phonetically. Maybe I wrote “She pulled off his cap.”, but I meant to write “She pulled off her cap.” I can just change the phonetic pronunciation of “his” to “her”, and the error is fixed without having to upload a new version! If I added an extra word, I can simply change the pronunciation of the extra word to a blank space. Then instead of reading the word, the computer will say nothing. Instant fix!

Of course, you’ll want to fix any errors you find. But, if you’ve already corrected your errors in the audiobook version, just remember not to update your audiobook when you upload your corrected ebook version. It won’t hurt anything (all your original changes will be saved and the errors will still be fixed), but it’s unnecessary. 

It’s too early to know if the work I spent proof-listening to my books to create audiobook versions was worth the time and effort. It took about four days for me to finish editing my 94,374 word book MYND Control. Granted, I did not spend every hour of the day proof-listening, but it was a sizeable chunk of time. I have no idea how many people will want to listen to the audiobook version, especially since I created it using an AI voice. Using actual humans, especially if I could have a different person read each of the main characters, would sound much better.

I have a long list of improvements I would like to see KDP make to their Virtual Voice tool. But I appreciate that they gave me the option to create an audiobook free of charge, and I’m sure that as they get feedback from their beta users, the tool will continue to improve. I have heard AI voices used by other companies that sound remarkably lifelike. In fact, it’s almost scary how well they can mimic humans. I’m sure KDP will get there eventually, but I’m not sure it is a service they will continue to offer free to their authors. So, I’m happy to have this opportunity now.

If you have an ebook and would like to create an audiobook version using KDP’s Beta Virtual Voice program, go to this link and request to be a part of the program:

To the world of audiobook readers, I’m glad to be able to offer you my books in this format. I hope you enjoy them!

17 views0 comments

Recent Posts

See All


bottom of page