Entity Linking Problem
Recently I have been taking part in a Data Science Internship with a medical company called Medwise. It has been very eye-opening and has introduced me to some of the ways Data Science is being applied within the medical industry.
During my Internship I was introduced to the Entity Linkin Problem, but what exactly is this issue? and how can we use data science tools to solve it?
Entity linking is a task to extract query mentions in documents and then link them to their corresponding entities in a knowledge base. This is very helpful as certain words can have multiple meanings depending on the context of the sentence, or even in the medical world, a specific medicine can have different brands and names.
An example is if we were to say
Michaels last performance outshined all his previous ones!
If you read this in a Music Article you would easily know it was referring to Michael Jackson, but if it were a Sports Article it would be referring to Michael Jordan. Basically we want to be able to discern what a word is specifically referring to.
Due to the rapid development of techniques, increasing concern for people's health, many medical websites have started to not only provide diverse medical information and medical news but have also begun to provide online consultation services about diseases and Q&A Systems.
Medical Q&A data can contain many diverse and ambiguous references to medical entities before it is processed. The diversity is that an entity is referred to in multiple ways, including aliases and abbreviations. This can lead to many medical entities that are actually the same thing, with different names or abbreviations being referred to as a different entity. This can possibly lead to confusion and even fatal mistakes
Without getting too technical I wanted to share the general approach data scientist take to solve this problem
To understand this process I want to introduce you to a concept known as Information extraction(IE) which is defined as: ‘the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources’. This is the first step of the process since we need the data to be structured and easily recallable in order for us to make connections between words in contexts.
Named Entity Recognition(NER)
This is also known as entity identification, entity chunking, and entity extraction. This is a sub-task of IE and it locates and classifies named entities mentioned in unstructured text into pre-defined categories, such as the picture above.
Named Entity Linking (NEL)
once we have located and classified named entities, Then it is the task of NEL to link entities mentions in the texts with their corresponding entities in a knowledge base, this will help to recognize when a word is different but refers to the same thing. This specifically can solve and avert many issues in the medical industry. In many cases there are numerous medications which have the same constitution and are used as a medication for the same conditions but have many different names/brands etc… in this case, we need to make sure that we can group those medicines correctly in the same group in order to not make any error in prescription or diagnosis.