Alexa is always listening, but not constantly taking notes. It doesn’t send anything to the cloud servers until it hears you say the word «wake» (Alexa, Echo, or Computer). But listening to awakened words is harder than you think.

Echo hardware isn’t all that smart. Without the Internet, any query or question you ask will fail. This is because your teams are sent to the cloud for interpretation and decision making. Amazon doesn’t want to record all of your conversations in front of the smart speaker, just the commands you give the smart speaker. For this reason, the company uses the word «awake» to get the attention of a smart speaker. To do this, Amazon uses a combination of finely tuned microphones, a short memory buffer, and neural network training.

Finely tuned microphones accurately detect your voice

Amazon Echo dot 3 with blue LED ring.
The blue LED will always point towards your voice. Amazon

Voice Assistant speakers like the Echo and Echo Dot usually have multiple built-in microphones. Echo Dot, for example, has seven. This array gives devices several options, from audible commands spoken far away to separating background noise from voices.

The latter is especially useful for awakening word detection. Using multiple microphones, the Echo can pinpoint your location relative to where it sits and listen in that direction while ignoring the rest of the room.

You see this in action every time you use the word awakening. Stand next to the Echo or Echo Dot and say the word wake up. Notice how the ring lights up dark blue and then light blue as it circles and «points» at you. Now move a few steps to the side and say the word «awake» again. Note that the light blue lights are following you.

Knowing where you are helps the device to better target you and adjust for noise coming from other sources.

Short memory prevents loudspeaker from holding too much

Echo devices have a lot of storage space, but they don’t use it much. Echoes can only physically store a few seconds of sound, according to Rohit Prasad, Amazon vice president and Alexa’s chief scientist for artificial intelligence.

By reducing its capabilities, Amazon not only gives you more privacy (it’s the place where your voice is stored), but also prevents the Echo from listening to entire conversations, limiting its attention to searching for the wake-up word.

Imagine that you have a three-second tape and a tape recorder. Suppose that after it reached the end, the tape went back to the beginning again and again. If you start recording a conversation, everything you said four seconds ago will be erased and immediately recorded. This is what the Amazon Echo does.

It records continuously, but erases everything it just recorded at the same time. This short attention span means that all he can hear is the word «Alexa» and not much else. However, three seconds is enough for this word to be recorded, checked and processed accordingly.

Neural network training helps with pattern matching

Block diagram of the layers of the Amazon algorithm.
A representation of the layers used by the Amazon algorithms. Amazon

Finally, Amazon relies on neural network training to teach Echo how to pattern match. Like other forms of machine learning, Amazon trains its algorithms by passing instance after instance of the word to Alexa (or Computer, or Echo, depending on which word wakes up in the company).

The idea is to capture all the inflections and accents, as well as the context. Amazon wants your echo to tell the difference when you’re talking With him when you speak about this, or perhaps when you are talking to by man named Alexa. Directional microphones also help with this purpose.

With every word that Echo hears, it passes sound through layers of algorithms. Each layer is designed to eliminate false positives, search for similar sounds or context clues. If one layer passes, the word moves on to the next. Finally, when the local device decides it has heard the word wake up, it starts recording and streaming the audio to Amazon’s cloud servers. Amazon uses four algorithms, one for each wake-up word (Alexa, Computer, Echo) and one for Alexa Guard, which treats certain sounds like glass breaking as a wake-up word.

But even when a match occurs, Amazon still performs more complex checks. Have you noticed that when someone says the word «Alexa» in a TV show or commercial, it usually doesn’t resonate with your «Echo»? This is because Amazon also does cloud validation.

Cloud checks rule out some false positives

The man from the Alexa commercial stared at his illuminated Echo toothbrush.
This hilarious Alexa commercial won’t wake up your echoes. Amazon

When companies make commercials with the Alexa feature, they can send the audio to Amazon. The company runs the audio through similar pattern matching algorithms used to identify the wake word. When this exact instance is fully cataloged, it is added to the database.

As part of the process when accessing the cloud, your Echo includes information about the wake-up call it heard and checks this database. Whenever it finds a match, Amazon instructs your echo to ignore the word «wake» and close and delete any recorded audio.

In addition, Amazon checks for the presence of a single word spoken at the same time. Not every company streams audio to Amazon, which is why the company has come up with a new backup solution. After verifying the database match, the company compares the wake word fingerprint with any other instances arriving at the same time. It’s unlikely that two people who say Alexa sound the same at the same time, so if there’s a match, Amazon knows it’s most likely an ad or TV show and ignores the request.

Despite all the checks, false positives still occur. You can listen to what your Echo recorded in the Amazon Privacy Center, and you’re likely to find at least one false positive in that group. But the technology is constantly improving, and ultimately, Amazon would like it to function without further ado.

