Exploration and Visualisation Of Word Vectors in Chat

At Verloop, we observed that the exploration-pipeline for different intents helped us in seeing a coverage jump of about ~10-11% from the dataset the machine learning model wasn't able to cover after it's first run of prediction. This is because, we weren't extracting misspellings and paraphrased sentences of the intents before.

Although the data used for the pipeline in this blog-post is based on the articles printed in BBC newspapers, the underlying logic replicates to chat data as well. For example, the intent exchange found in chat queries of E-Commerce companies like Amazon can be misspelled and written as:

Order hasn't been exchanged yet!

Please ex change my product.

Expecting exch to take place by Thursday.

The same intent can be paraphrased as:

Change my product. It is not working.

I had issued a replacement. What is its status?

Order hasn't been replaced yet. Who is responsible here?

It is not feasible to use Edit-distance based, Token based, and Sequence-based algorithmic approaches to cover all the misspellings as well as paraphrased sentences of an intent in the dataset. The exploration-pipeline, here, can do that, increasing the coverage of the intent in the dataset.

The exploration-pipeline is diagrammatically represented as:

Flowchart

A popular idea in machine learning is to represent words by vectors. FastText vectors capture hidden information about a language, like word analogies or semantic inforamtion.

The word analogies and semantic information help us in :

Finding and extracting misspellings of a target word in the entire dataset.

Finding words that occur commonly in context (in neighbour) to the target word.

Finding similarity between different words given as a cosine of the angle between the word vectors.

Visualisation on TensorBoard gives us a 3-dimensional view of a 300-dimensional FastText word vector. It makes us visualise and extract the top misspellings and the closest neighbouring words of a target intent, increasing the coverage of the intent in the dataset.

To begin with the execution, it is important to know the pre-requisites and execution steps which can be found here.

Posts

Visualising Word Vectors Using TF2 [Advisable]

Exploration and Visualisation of Word Vectors Using TensorFlow 2

May 21, 2020
Visualising Word Vectors Using TF1 [Not Advisable]

Exploration and Visualisation of Word Vectors Using TensorFlow 1

May 21, 2020
Generating Word Vectors Using FastText Blog-Post

Conversion of words in the vocabulary of the dataset to vectors using FastText.

May 20, 2020
Prerequisites for Exploration-Pipeline

A complete guide on the requirements needed and the step by step execution of the exploration-pipeline.

May 18, 2020