IDefine Europe medline

There are more than 7,000 rare diseases that together affect more than 400 million people in the world, showing that rare diseases are a global problem. Data and information around them are usually scattered, fragmented, and kept in silos. Taking advantage of open data, the fast-growing pool of biomedical knowledge allows the use of machine learning methods for the prioritization of insightful and meaningful information retrieval. This information has its own challenges both in the data itself and in its appropriate representation, enhancing its usability by health professionals. This work and the related research paper present a framework that leverages the MEDLINE dataset and its controlled vocabulary, the MeSH Headings, to annotate and explore rare disease-related documents. The system automatically assigns text MeSH Headings to the input documents and extends their legacy metadata, allowing the user to explore rare disease data sourced from news and published science through interactive data visualization. The first version is focusing on 16 specific rare diseases, that fall into the group of rare monogenic neurodevelopmental disorders.

Interact with the Data Visualisation Modules to Explore MEDLINE

Description: This interactive data visualization tool allows the user to explore the 26+ million scientific articles on the MEDLINE/PubMed open dataset, hand-annotated by health experts using 16 major categories and a maximum of 13 levels of deepness. In the above dashboard, we present interactive visualisation modules designed to leverage the scientific knowledge in this dataset to better understand what is known and published about Rare Diseases and their relation with, e.g., medical categories or chemical compounds.

Functionality: Explore the MEDLINE data on the different perspectives of rare diseases on your own through the interaction with data visualisation modules that were set up. These can enhance dialogue with parent associations and expert communities, and gain a deeper understanding of what types of meaningful data visualizations can be created without requiring extensive technical expertise, we’ve created a data visualization dashboard. This dashboard offers users real-time access to the MEDLINE dataset. Powered by Elasticsearch and utilizing the Kibana open-source data visualization plugin (refer to Figure 3), it enables rapid prototyping with pre-prepared and preprocessed data samples, with a particular focus on rare diseases and specific syndromes examined in this study.

Move the pointer over the tag cloud and change the order of results

Description: This tool uses text mining algorithms to help surface information we are looking for, avoiding the standard prioritization of articles that is biased by definition. To this aim, it exhibits the clustered keywords of a query, after searching for a health-related topic. It allows for Lucene syntax in, e.g., searching for all scientific articles hand-annotated with the health category Coronavirus by writing: MeshHeadingList.desc:”Coronavirus”. It changes the position of the results according to the choices of the user when moving the pointer over the word cloud of subtopics.

Functionality: It is designed to improve the search engine experience; the user provides further information to the search by interacting with the system by dragging a pointer over word clouds. These word clouds are produced by cosine similarity to an “average” centered on the topics in each abstract of the set of selected papers, clustered using the k-means algorithm. Besides the usual query over keywords, you cat search for the articles hand-annotated by the MEDLINE experts with a certain health topic, by writing MeshHeadingList.desc:””. You can also use connectors on your queries so search to, e.g., retrieve all the articles hand-annotated with both the Coronavirus and the diabetes health classes, by writing: MeshHeadingList.desc:”Coronavirus” AND MeshHeadingList.desc:”Diabetes”.


* at the moment Chrome with original settings is facing difficulties in the full functionality of the visualisation tool above. This is solved by disabling the ‘out-of-blink-cors’ flag in chrome://flags/#out-of-blink-cors. If you face such problems and prefer not to disable the flag, please use it with, e.g., Firefox.

Copy and paste any text you want to annotate with MeSH classes

Description: The MeSH Classifier is a tool to automatically annotate any text snippet with the MeSH health classes. The algorithm learned on the knowledge of 80+ years of published biomedical articles at MEDLINE/PubMed, hand-annotated with health classes hierarchically organized over a MeSH tree with 16 major categories and a maximum of 13 levels of deepness. It provides all the classifying categories with position number and (cosine) similarity weight, with a slider and a number of maximum categories visible. It is also available through an API (per request).

Functionality: It is designed to classify free text of any nature with the classes of MeSH Headings, using a tailor-fit algorithm learning over 80+ years of published biomedical articles. It can classify any documents of interest, including medical reports, electronic health records, and health news. At the moment it is limited to text in the English language, due to that MEDLINE/PubMed is also only available in English.

Disclaimer: The core system was developed by the AI Lab at the Jožef Stefan Institute, and refocused by Quintelligence within the MIDAS project to analyze the MEDLINE dataset. It can be implemented in premises to work with proprietary data, or used as a service. It is currently available as Open Source under the BSD license. It can be used in any document set that can be indexed, analyzed and visualized with this approach.