Key features of NLP

This article introduces the features of the Natural Language Processing (NLP) used in industry, such as text prediction, the entity recognition, theextracting relationships, the Machine translation, theSentiment analysis And the Answers to questions. It also mentions use cases for chatbots and spam filters.

After having introduces Natural Language Processing (NLP) on our blog, in this article we will see some features of NLP, which can be used in the industrial world. Let's go!

Automatic suggestions

This feature is used to speed up the writing of emails, messages, and other texts. Very “trendy” in recent years, it is increasingly integrated into our mailing and word processing tools. It is based on the prediction of the next word that is most suitable for the sentence being written. This technology is based on probabilistic or word vectorization models who have learned how words are linked together in a sentence.

‍

Recognizing named entities

This technique is mainly used to identify and classify named entities (words or groups of words) in unstructured texts into predefined categories such as:

Organizations
Names of people
Geographic areas
Quantities
Prizes
Durations
percentages...

To achieve this, we can use linguistic resources (dictionaries) but also deep learning algorithms that have learned to identify each entity.

‍

Extracting relationships

Here, we seek to extract the semantic relationships between the entities identified in the text or speech such as “is located in”, “is married to”, “is used by”, “lives in”, etc...
This use case is addressed primarily through linguistic relationships such as”Part-of-Speech” (verb, adverb, noun, etc.) or lexical relationships (synonyms, homonyms, etc.).

‍

Machine translation

One of the most common uses of NLP is machine translation. Thanks to this, translations are done much more quickly, despite the need to check certain turns of phrase afterwards. The added value is very important for businesses that need to translate content such as product reviews, regulatory documents, and emails on a regular basis. The best known applications for machine translation are: Google Translate, Amazon Translate and DeepL. Generally, these systems rely on networks such as recurrent neural networks (RNNs), sequential networks (Seq2Seq), and more recently, transformers (which we will discuss in our next article).

‍

Sentiment analysis

This feature addresses how people perceive certain topics or services. This makes this option very useful for many businesses (customer service, communication, strategic marketing, etc.).

Its purpose is to verify that goods or services will satisfy customers and to create surveys for brands and even political candidates. This not only helps businesses gain knowledge about how customers perceive these goods or services, but it also helps improve concepts, products, marketing, and advertising while reducing the level of dissatisfaction.

This is a classification problem and can be treated in a “classical” way by machine learning approaches such as linear classifiers (logistic regression, Bayesian classifier, etc.), k-nearest neighbors, decision trees, or neural networks such as recurrent neural networks (for example LSTM) or transformers (for example BERT).

‍

Answering questions

Most questions asked by humans can be answered using an NLP feature called Question Answering. The model will first analyze the question, using in particular the function of recognizing named entities, then formulate a response in return, according to its knowledge base (which can be very broad!). Examples of applications for answering questions are Siri, OK Google, and Virtual Assistants.

This feature is developed using machine learning methods that learn to understand language features without human supervision. Here are two examples of language features that can be used:

Check if the sentence structure is correct (for example: subject, verb, object)
Understand the meaning of the sentence (for example:”What is the distance between the Earth and the Moon?” It is therefore a question of giving an answer that contains the value of the distance)

Statistical methods paved the way for this approach. This is broken down into 2 main steps:

Workout: a knowledge base is created from the analysis of an annotated text corpus (model training data set),
Prediction: identification of the best answer to a new corpus of text (user question).

The context is generally identified by searching in a taxonomy (classification of different words into categories and sub-categories) that was created using the named entities and the classifier. Neural network architectures are also very useful for this use case. They make it possible to map the textual context into logical representations that are then used to predict responses. Among these neural networks, it can be noted that recurrent neural networks (RNNs) have shown their effectiveness in dealing with this type of task.

‍

These features are the basis for what NLP can do.

Here are two examples of use cases that you have most likely heard about: Chatbots and spam filters.

Chatbots

Unlike the “Answering questions” part where the answer exists in a given corpus, chatbots generate their own answers.

Chatbots are very effective for both businesses and consumers. They already answer a lot of questions.

However, businesses are now pushing further in the development of chatbots in order to be able to communicate at a human level with all its complexity.

Chatbots are useful for businesses not only when it comes toimprove customer experience and satisfaction but also to respond to the numerous employee questions.

Chatbots use 2 different types of approaches:

The NLU (Natural Language Understanding) to understand the meaning of the question.
The NLG (Natural Language Generation) to generate a response that is easily understood by a human.

‍

Spam filters

Email filters are a common use case for NLP. This feature allows you to block unwanted emails. These are identified as such by extracting the meaning and frequency of certain words in the body of an email.

These functionalities are integrated into machine learning approaches or neural network algorithms, which will make it possible to improve thanks to user feedback.

‍