“How do I put this delicately?” my colleague from marketing wondered out loud. “Don’t!” I said. “Let’s have it! Feedback is good!” In the end, he did find a nice way of saying it, but the verdict was clear: The AI-generated voiceover for some of our course videos sounded dull, breathless, and was plagued by background noise.
Luckily, we had used two different voices, one male, one female. If the male one didn’t cut it, perhaps the marketing team would like the female one better? Personally, I agreed it sounded much livelier. I found a clip and played it for them. “Wow, yeah, that’s ten times better! Feels natural, has range, and proper pacing.” I thanked the two guys for their comments.
Now it was my turn to wonder out loud: “Maybe they just had better samples to train her voice model? Or more data? I’ll have to ask the team.” When I later relayed the feedback, I did. “Guys, marketing says the units with the female voiceover are much better. Did we do anything differently in building her AI model?” As it turned out, we did: “Well, I recorded my clips myself,” the lady behind the voice in question said. “Oh! Yowza! That explains everything. Thanks for letting me know.”
In another meeting later that day, I again got the same feedback from yet another person: “Yo, that male voiceover is monotonous. Something about it just feels off.” Like me, no one could initially swear one of the voices wasn’t AI-generated, yet everyone could still tell the difference: “Yep, this one sounds better.”
AI voice generation is one of the most advanced subbranches of the field, and yet, even here, humans can still distinguish between the real and the artificial. Will it always be this way? Most likely not. But for now, it seems it still behooves us to give our art the A but not forget the I.