Siri Speaks: The Human Behind the Voice of Siri – and How Marketers Can Benefit from Voice Recognition Technology

, , , ,

By Robert Palmer, Chief Innovation Officer, HCB Health

 

We usually see Siri spelled in caps and lower case, as if she’s a real person with a very weird name, instead of an application on your phone. But SIRI is in fact an acronym for Speech Interpretation and Recognition Interface. If you require a more intimate relationship with the voice on your phone, take comfort that Siri also means “beautiful woman who leads you to victory” in Norwegian. One of its inventors, a Norwegian named Dag Kittlaus, once considered Siri as a name for his daughter – hence the original name of the technology – but the acronym soon followed as a more officious moniker was deemed necessary. And if you want a sample of Siri’s wry sense of humor, ask her “what does Siri mean?” Here’s one of several mind-boggling answers you’ll receive: “It’s a riddle wrapped in an enigma, tied with a pretty ribbon of obfuscation.” Perhaps it’s telling that Siri also means “secret” in Swahili; maybe Steve Jobs had a sense of humor after all. 

Creating the voice of Siri

 The woman who was the original voice of Siri is my high school classmate and friend Susan Bennett. Susan is an accomplished professional voice artist who greets passengers at Delta gates worldwide, and has given voice to a long list of brands including Coca-Cola, McDonalds, and the Discovery Channel. In 2005, she was doing a lot of work for phone messaging recordings when a new set of scripts showed up for an IVR (Interactive Voice Recognition) technology company. The script was unlike anything she’d seen before. The script for a IVR voice recording is relatively simple – a number of commands and responses are recorded so they can be arranged through a decision tree to drive customer service interactions. But this script was different. “It contained words that were strung together in nonsense sentences, such as ‘cow hoist in the tub hut today,’” Susan told me. But there was science behind the nonsensical sentence structures; they contained words that could be reformed into other sentences, in every possible sound combination.

She took on the grueling job of recording scripts over a period of four weeks, four hours a day, five days a week. The job was for a company called Nuance, the IVR company that powered Siri. She had no way of knowing that Siri’s technology and her voice would ultimately be sold to Apple, becoming the ubiquitous voice on smartphones around the world.

“Each phrase had to be read with no emotion,” she told me, “but not in a monotone.” Needless to say, it was as tedious as it was challenging and exhausting. “It’s very taxing to read something that doesn’t have a break in tone,” she said.

On October 4th, 2011, the iPhone 4S changed smartphone technology forever, and Siri was introduced to millions of users.  Susan soon found out that she was Siri. “When I first found out that was my voice,” she said, “to be honest it was a little creepy.”

“Siri became a persona,” Susan told me, “but Apple wanted the voice to be anonymous – they wanted people to create their own character. I became famous in a non-famous way.” Susan was the voice of Siri for two years, and was called back into the studio for a number of weeks in 2011 and 2012 to record things like GPS locations. She says she’s “not a technical person,” but since becoming the voice of Siri she has been the original voice on a number of high-tech applications: the GPS application Waze as well as a lot of other GPS systems, to name a few. In a 2013 Time Magazine interview, Susan was asked why Siri often sounded a little snippy. “There are some people that can just read hour upon hour, and it’s not a problem,” she said. “It became very tedious and was really hard on the vocal chords. That’s one of the reasons why Siri might sometimes sound like she has a bit of an attitude. Those sounds might have been recorded the last 15 minutes of those four hours.”

How marketers can benefit from voice-activated chatbot technology

 When Siri was first released, she mined the deep-seated human motivation to communicate through speech, and made speech-activated interactions not just possible, but popular and intuitive. Making complex information easier to access and more natural to use is the Holy Grail of marketing success. Healthcare marketers, for example, now have the ability to create user-activated applications on mobile devices that engage patients and healthcare providers in “conversation” rather than departing information through written words. A mobile application that in many ways mimics a phone’s native app – Siri – has the extra appeal of intimacy and personalization. With IOS 10, Apple allowed Siri to interact with more third-party apps, but Siri’s technology does lag behind some of her competitors. Chatbots that use much of Siri’s underlying technology – but enhanced with artificial intelligence – allow a mobile or Web application to call up specific information from specific sources, and a user can choose to deepen the interaction through text, video, SMS, or further voice-activation.  From a regulatory point of view, IVRs can be scripted to reject certain questions or restrict information for certain subject queries (cursing at a recalcitrant Siri is an effective way to shut down the conversation, for instance).

Before getting into the technology behind Siri, let’s look at how a new generation of voice-activated chatbots can enhance user engagement. Enterprise chatbot solutions have become very sophisticated since Siri launched, with artificial intelligence and voice-activation improvements leading the way. The fact is that Siri is looking a little dowdy these days when compared to competitors such as Amazon’s Alexa, Google Now and Windows 10 Cortana. A number of chatbot applications can be integrated with a variety of mobile and Web applications, bringing revolutionary change in the ways we communicate, work, perform specific tasks, and even browse the Internet. Text-based chatbots lack human emotion, while voice-activated interactions can enhance and elevate engagement. Our primal instinct is to communicate through voice; in our everyday lives over 90 percent of human conversations are spoken words.

Chatbots are expected to replace IVR technology across a wide range of interactions that frustrate the user through a dehumanizing regimen of voice controls. How often do we scream “Representative!” into our phone when all we’re trying to do is get a straightforward answer to a simple question? Voice-activated chatbots use artificial intelligence to create a more human experience; the new generation of sophisticated chatbots better understand what you’re asking. They replace static, rule-based systems (decision trees) that dictate a linear dialogue flow with the more intuitive system of give-and-take we know as normal conversation.

Digital with a human touch – “Conversational Commerce”

 Steve Jobs had a mission: have Apple’s devices bridge the chasm between technology and the humanities. Siri was a natural enhancement for the iPhone, bringing “human” voice-driven interaction into the digital age.

Besides making two-way communication more humane, Chatbots such as Siri provide immediate verbal feedback, removing the robotic, machine-driven barriers associated with standard rule-based IVR technology. Perhaps most important, verbal chatbots allow the user to multitask, freeing their hands from navigating through an on-screen interface. This is not only important when the user is trying to drive a car or cook a meal; a patient with a mobility-limiting condition can access information – or give information – in a far more satisfying manner. This “conversational commerce” can be more than just convenient and intuitive – it can be fun. Depending on how a chatbot is programmed, it can have a sense of humor; Siri, for example, can tell a joke or two within a given interaction. All of this serves to more closely resemble normal human interactions.

The technology behind the voice

Creating an IVR chatbot experience needn’t be a daunting task. There are a number of software companies that provide very useful off-the-shelf chatbot voice-activated applications. On the relatively unsophisticated end of the spectrum, Nexmo, an API platform from Vonage, can add a simple Interactive Voice Response script to existing applications; instead of navigating through an interface, the app can recognize voice commands to perform a task. The Nexmo chat app can also engage customers in real time by integrating with Facebook Messenger or many other readily available messenger apps. On the more sophisticated end of the spectrum, Nuance – who brought us Siri – offers off-the-shelf software that can add natural, intuitive self-service conversational capabilities to a variety of Web and mobile applications.

The science behind Siri was revolutionary when it was introduced to the world in 2011, but relatively simple compared to today’s higher-end AI-driven enhancements. Algorithms converted Susan Bennett’s words that were recorded in seemingly nonsensical “sentences” into conversational responses to the user’s questions. “I’m not a technical person,” Susan told me, and she didn’t have to be. She just needed to endure the tedium of feeding thousands of words into an engine that was programmed to string them together in complete sentences that made sense – with no emotion and only the occasional bout of snippiness.

When you speak to Siri, it converts your words into a data file, which is sent to multiple servers. It’s difficult to interpret a voice from ambient noise, but Siri does a good job of interpreting accents, dialogues and the small nuances of your voice – this alone is a complicated technology. Natural Language Processing steps in to help Siri be as intuitive as possible, and once your query is in Apple’s servers multiple flowchart branches search databases for possible answers to your question. Additional algorithms select the most likely response. The software “learns” constantly – over time, there aren’t many questions that haven’t been asked. Other technologies are layered within the application to help Siri think and retrieve the most accurate answer to your question. GPS technology, for instance, almost instantly determines your location, which comes in very handy when you ask Siri “where can I find good pizza?” If Siri fails to understand a question, she has a default reaction that is rather inelegantly designed to avoid making her sound stupid: “Would you like to search the web for that?”

Where a marketer can go from here

 We owe a debt of gratitude to Siri – she paved the way for normalizing hands-free voice recognition. Consumers are now conditioned to interact with their smartphones and other devices in a less intimidating and more intuitive way. Marketers now have the ability to use this technology to remove the psychological and emotional barriers of customer interactions with machines that rely on hunt-and-peck technologies that date back to the dawn of the Web. And healthcare’s regulatory barriers can be overcome with the intelligent structuring of how the voice-activated interactions return information. The same technology that can learn how to interpret a question can be programmed to learn how – or how not – to return an answer.

To learn more about Susan Bennett – the woman who gave a voice to Siri – go to her website: www.susancbennett.com.  For the marketer in search of a voice artist who’s up for just about any challenge, Susan is a great choice.