Confusion Matrix Visualization for SpaCy NER

Dulaj Rajitha
2 min readJul 25, 2019
https://pngtree.com/free-png-vectors/question-mark

SpaCy provides us to train our own Entity Recognition Models (NER) with custom classes. But, when comes to the model evaluation, we don’t have a standard way to visualize the confusion matrix using in built methods.

This article will show you how to generate the confusion matrix and visualize.

Dataset Format

SpaCy has a standard dataset format like the following. Annotations are in the form of BILOU tags. Read more about BILOU encoding.

The Requirement is to generate same shape two vectors as expected/ target values and prediction values.

In the tuple row, the first element is the text. Therefore using that text we can get the Named entities using the trained model.

The second item is the list of tuples representing the Named Entity Tag. Using that we can generate the expected values vector

Create Target Vector

This method is to create the target class vector using labeled data.

Method create_total_target_vector() for the above mentioned document list will provide a output like following.

['Attraction',
'O',
'O',
'O',
'O',
'Attraction',
'Attraction',
'Attraction',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'Country',
'Country',
'O']

Create Prediction Vector

Method create_total_prediction_vector() for the above mentioned document list will provide a output like following.

['Attraction',
'O',
'O',
'O',
'O',
'Attraction',
'Attraction',
'Attraction',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'Country',
'Country',
'O']

Identify The Classes

Next we need to identify the classes/ labels in the dataset.

We can get them using the Spacy model as following.

Or else we can calculate them using the dataset’s entities as well.

sorted(set(create_total_target_vector(docs)))

Generate the confusion matrix

We can simply generate the confusion matrix using the sklearn library. Following shows the sample code.

Visualize the Matrix

To visualization, we can use the matplotlib. More information can be found in this example.

confusion matrix for above dataset

The Jupyter Notebook for the all the methods can be found here.

Please leave a comment below if you have any questions or feedback!

If you like this post, follow me on Medium for more similar posts.

If you have any concerns or questions, please use the comment section below.

--

--