Other parts of this series:
Exploring AI Part two: Unpacking NLP
Natural Language Processing (NLP) is a sub-field of machine learning that is focused on enabling computers to understand and process human languages, to get computers closer to a human-level understanding of language. It's everywhere. It’s integrated into our phones and devices such as Amazon’s Echo and Google Assistant. Devices like these are already very capable, even though NLP has barely got started.
During 2012 there was a revolution in Computer Vision - the branch of machine learning that can make sense of image data, including, for instance, the ability to recognise objects and people in photos. Although we’ve been successful in creating very capable systems before that point, the problem was that those systems took masses of data to train – generally millions of images. This restricted how useful they could be, because it’s not always practical to collate such a large amount of training data. What changed was the introduction of transfer learning to the problem – taking a powerful model trained on a very large dataset and then applying it to another loosely related task in a way that requires much less data, but still delivers excellent results.
NLP has also had some success already, but it looks set to make great strides this year, just as Computer Vision did in 2012. Both disciplines historically relied on hardcoded rules behind the scenes to make them work, which limited their potential, and in both that has given way - almost universally - to use of Deep Learning.
Last year was a breakout year for NLP, for the same reason that 2012 was for computer vision – the introduction of effective transfer learning methods and in particular through the use of powerful Language Models trained on large datasets (more details on those to come) to bootstrap general understanding.
OpenAI, a non-profit organisation backed by several high-profile Silicon Valley entrepreneurs, recently announced their paper on GPT-2, a language model that can create convincing stories and text to a level not seen before. While high quality text generation is not totally new in NLP, the cohesiveness of the output over long runs of text appears to be better than earlier models, and its quality has surprised even NLP researchers! Their decision to delay public release of the model while they evaluate the potential for it to be misused by bad actors also brought it to the attention of the mainstream media with some outlets recklessly misrepresenting the move as them “locking up” the system because it is “too dangerous”. This is unfortunate, as it creates unnecessary and unwarranted AI scaremongering, of which there is quite enough already.
To get a better understanding of what this all means and how it might be useful in other areas, let’s unpack it…
What is Language Modelling?
Language modelling refers to models trained to read a body of text and predict the next word with only the previous words as input. The more text you provide for it to learn from, the better the model will be; hence you’ll often find models trained on the entire Wikipedia corpus, a large subset of Google News or, in the case of GPT-2, a carefully sampled collection of high-quality text scraped from the internet as a whole.
A quirk of language models is that they can be used to generate text, mimicking the corpus from which they were trained. You do this by feeding it a starting word (or sentence), then taking the predicted word as its next input and repeating the process. When you do this something magical happens. Take the below diagram an example of what might happen for a model trained only on Shakespeare…
Until recently, Language Models had very few applications beyond simple tasks such as predictive text messaging and the type of autocomplete prompt you find when searching Google, but following the introduction of transfer learning to NLP they look set to power many new innovations in NLP.
Why are language models important?
Despite language modelling appearing to be a simple task, it is actually deceptively difficult. Generally speaking, predicting the next word in a sentence requires a deep understanding of both language and real-world concepts. If I asked you to provide the next word in the sentence “When asked which of the elephant or the mouse was smaller, Emily pointed at the …“ you could do it easily, but only because you know that a mouse is smaller than an elephant. For a language model to be able to accurately do the same, it needs to have first learned that relationship. To be generally successful it needs to learn a lot about how we speak and the world around us. So, while it is simple to frame the problem, the model that we learn as a result is surprisingly powerful, provided that it is fed with enough good data.
Another nice feature of language modelling is that it is an unsupervised approach: the answer is simply the next word each time. By contrast, most machine learning methods that you come across are based on supervised training, where you explicitly give it examples of what the answer should be. For example, to train a model to translate English to French, typically you’d give it pairs of English and French sentences. As you can imagine, preparing such data is very time consuming and expensive. One of the main limiting factors of machine learning and AI is access to data - generally speaking, the more data, the better it gets, and so the unsupervised nature of the language modelling task is very beneficial.
Open AI took advantage of this in an extreme way with GPT-2 by training the model on an incredible amount of text from web pages. By our estimates to replicate GPT-2 you’d need approximately £100,000 of cloud compute. So even though it’s technically possible to replicate their efforts, it would be very expensive.
At this scale things certainly start to get magic and mysterious. Not only is it possible to capture enough information to mimic text, but by having sight of so much human knowledge, it captures the rules of our world, simply by knowing which words hang around with each other.
How is this useful to me and my business?
The OpenAI model, and those similar to it, have made great advances, but they’re far from delivering on the promise of unsupervised multitask learning. But that shouldn’t stop you from exploiting the technology right now.
If you have access to a lot of documents or free-text, such as agent notes, then it is possible to take the smaller (publicly released) version of GPT-2 or another similar language model and then use it as-is, or better fine-tune it to your specific task.
Specifically, here’s some business questions that you may be able to answer through the use of NLP:
- What do my customers think of my company, and which aspects of my business have been improving or getting worse over time? (based on review sites such as Trust Pilot)
- What do my employees think of my company? (based on articles from Glass Door)
Which of these documents will be helpful in completing my task in xyz?
Which of xyz categories does this document belong to?
How can this document be summarised more succinctly?
How should I group this content on my website to optimise for search engines?
Which judge is likely to be more sympathetic to my client’s legal case?
Which of my marketing content doesn’t match our brand ‘tone of voice’
Which of my overdue debtors are likely to respond to litigation?
How sincere is this piece of marketing copy?
As you can see, the range of tasks is almost endless. As data becomes even more critical to companies and the volume increases it will be harder to keep on top of it. AI, NLP and other technologies are able to help with this.
By getting on-board with NLP you can realise the benefits right now, and stay ahead of the curve.