A selection of smart devices. Their natural language understanding capabilities are improving all the time. But how?
A selection of smart devices. Their natural language understanding capabilities are improving all the time. But how?
The natural language understanding capabilities of smart devices are improving all the time. But how? Image: Maria Alberto, Pixabay

How Your Digital Personal Assistant Understands What You Want (And Gets it Done)

One of the most remarkable things I learned during my studies in Computational Linguistics was that we still don’t know how language is processed in the brain. We know that an average human has around 80,000 words in their vocabulary, and that somehow, when we speak, our brains are able to form ideas, crawl through that vocabulary to find the right words, string them together, and express them. When we listen, we must decode an incoming audio signal into words, and search our own mental lexicons in order to extract meaning from them. And all of this at breathtaking speed.

Contents

Why is language so damn difficult?

If you’ve ever tried to use an automated chatbot, or the voice interface in your mobile phone or your car, chances are you’ve experienced moments of confusion, or even communication failure. Why? Well, these interactions are facilitated via natural language, and natural languages are just plain hard. That’s because they are:

  • structured — if you perform a Google search for ‘brown gloves’, native speakers know (thanks to the grammatical structure of the utterance) that ‘orange gloves’ would be more relevant than ‘brown purse’, but that’s not so easy for a machine to understand
  • inferential — there’s also meaning in what isn’t said. If I search for ‘formal dress’, we know that I won’t accept results for ‘casual dress’ but I might for ‘formal gowns’. But how should a machine know that?
  • lexically and syntactically ambiguous — words and sentence structures can sometimes be interpreted in multiple ways, which means one search query could match a huge variety of results, only some of which match the user’s intent
  • context based — sometimes the only way to disambiguate a lexically ambiguous word is through the surrounding words; furthermore, the same word can mean different things to different people depending on the ‘context’ that is their life
  • negatable’ — this is a small but common pain point for natural language processing. For example, adding a ‘not’ to a sentence, or speaking sarcastically, changes the entire meaning
  • multimedia based—we don’t just communicate through text but also spoken messages, emojis, hashtags and so on.
A public sign with a grammatical error, shows the difficulty of understanding natural language, even for humans.
A public sign with a grammatical error, shows the difficulty of understanding natural language, even for humans.
This example of syntactic ambiguity could momentarily confuse a human; it could completely bamboozle a machine. Image: ViralNova

What is a digital personal assistant?

Despite having a voice, a ‘personality’ , and a semblance of ‘self-awareness’, under the hood a digital personal assistant is simply a software application. Throughout this article I will anthropomorphise a little with phrases like ‘the assistant knows…’ or ‘Siri tries to…’, but this is a stylistic choice and should not imply that the thing you’re interacting with is capable of thinking or knowing anything at all! When I say ‘assistant’, I thus always mean ‘application’.

What is Natural Language Processing? And NLU? And NLG?

Alright, let’s clarify some of the terms introduced above.

  • Tokenizing: means splitting the input text into individual words, aka ‘tokens’. (Purpose for ML: most algorithms take their input as series of individual tokens).
  • Stemming: involves stripping the endings from words to leave only the word stem. (Purpose for ML: to reduce computational load by reducing the count of vocabulary that need to be processed; to improve performance by ensuring all words are represented in a consistent way, thus also boosting the number of training examples which feature each stem).
  • Note that stemming may not always result in a grammatical word. For example, converting plural nouns to singular can be done by removing the suffix -s, but this won’t work for irregular English nouns. Thus we get: dogs → dog, but countries → countrie, and women → women. Similar problems arise in other languages, too. For example, in German many plural nouns can be converted to singular be removing -en or -er, but irregular nouns pose problems, too. Thus we get Frauen → Frau (Women → Woman), which is correct, but Bücher → Büch (Books → Book, where the latter should actually be spelled Buch).
  • Lemmatizing: means converting each word to its standard form. Again an example could be reducing plural nouns to singular, but with lemmatizing, the result should also be a grammatical word. (Purpose for ML: as above).
  • Part-of-speech tagging: means assigning the grammatical roles, such as ‘noun’, ‘verb’, or ‘adjective’, to each word in the sentence. (Purpose for ML: parts-of-speech can be useful input features for various language tasks).
  • Named Entity Recognition: assigning labels like ‘person’, ‘place’, ‘organisation’, ‘date/time’ to relevant words in the sentence. (Purpose for ML: as above).
  • Natural Language Generation: involves generating human-like text. This can be done using automated rules (for simple, restricted, repetitive contexts like generating weather reports from weather data), or else using neural network which were specifically trained to generate text.
  • Speech Synthesis: aka text-to-speech is, of course, the process of generating synthetic voice audio from text. The models are trained just like ASR models, though of course, the input and output sequences are the opposite.

How Does Natural Language Understanding Work?

Before we discuss the ‘how’, let’s demonstrate NLU in action. What’s the first word which comes into your head when you read the following?

Google auto-suggestions for an incomplete search show how Google’s language models have learned typical language patterns.
Google auto-suggestions for an incomplete search show how Google’s language models have learned typical language patterns.

Conclusion

So that’s it — how your digital personal assistant knows what you want, and gets it done for you. Simple? No. Remarkable? I think so.

  • [2] The process for learning a second language is different than for a first language, particularly as one grows older, but the general point about having a language model in our heads remains so.

Data Scientist. Computational Linguist. Education Lead Women in AI Upper Austria. Sharing interesting resources on AI and our future with it.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store