By Jane Z Reed, PhD., Head of Life Science Strategy, Linguamatics

Jane Reed

If you are a clinical researcher or scientist, you are well-aware of the inefficiencies of current search processes that require hours of time wading through data to find hidden nuggets of valuable information. On average, knowledge workers spend almost nine hours each week searching for information to  advance the development of new drugs or identify life-preserving gene therapies or to ensure compliance with the latest regulatory roles.

At GlaxoSmithKline, for example, clinical safety team members regularly review medical literature to identify relevant safety signals. The organization calculated that a daily monitor of literature for just 20 marketed products typically reveals an average of 60 new references per day, and that each abstract takes 1.2 to 1.6 minutes to manually review – for a total of about 1-1/2 hours per day. A typical pharma company may have 200 marketed products in its portfolio, which is to say the review process for pharmacovigilance is hugely time consuming for most organizations.

Adding to search challenges is the ever-growing volume of available life science and healthcare data, paralleling the vast proliferation of high-throughput biology, digital technologies and online journals. Every year, millions of new documents are published in the form of academic research, patent applications, clinical trial findings, and more. The sheer volume of available data can be overwhelming, even for scientists with a narrow field of study.

Leveraging Natural Language Processing AI technology to boost search efficiencies

While some scientists may have developed strategies for wading through mountains of documents to find relevant insights, imagine how much more productive researchers could be if their searches were less time-consuming, more efficient and cost-effective. Unfortunately, up to 80 percent of the information that researchers need is unstructured text that is difficult to search and analyze using traditional manual methods.

New artificial intelligence (AI) technologies, however, are generating considerable excitement in the biopharmaceutical community due to their potential to revolutionize pattern identification, predict successes and failures, and improve research decision-making. Natural language processing (NLP), for example, helps organizations make effective use of unstructured data by using linguistic algorithms to identify key elements in everyday language and extract meaning. NLP is a key component in many AI or machine-learning applications because of its ability to search large data sets and categorize concepts for extraction, creating quality input for downstream machine learning models.

GlaxoSmithKline’s use of NLP to improve search specificity and speed

At GlaxoSmithKline, researchers are leveraging NLP to boost search efficiencies. Instead of manually searching documents for adverse events, the company uses to NLP tools to find events in minutes versus hours. In addition, GlaxoSmithKline uses linguistic processing to improve search specificity while identifying appropriate relationships been a drug and an adverse event.

For example, in a single manual search (without the use of NLP) to find adverse events associated with the selective androgen receptor modifier Enobosarm (an investigational drug also known as MK-2866 or Ostarine), GlaxoSmithKline pulled 132 abstracts. After a three-hour manual review, researchers found that only about 30 percent of the abstracts were relevant and actually described an association with an adverse event. A similar search using NLP tools took just minutes and provided a structured results table for rapid final review.

Searching for safety-related data at Merck EMD

Merck EMD is also using NLP to assist with safety-related searches. NLP search algorithms can be focused to search and extract nuggets of information, particularly useful for rare events. One Merck safety team needed to know the answer to a very specific question around skin cancer adverse events of PD-1/PD-L1-inhibitors. These are new immuno-oncology drugs in development, so there is very little information published in scientific literature, and the skin cancers of interest are also very rare adverse events.

To make the search even more complex, these inhibitors are also used as therapeutics for some of the rare skin cancers. The Merck team were able to design NLP strategies to search a bespoke index of and find reliable information in the “serious event” field from 500 trials. This enabled a comprehensive set of adverse event relations to rapidly and effectively answer the team’s safety questions.

Putting the power of NLP in the hands of end-user scientists

Despite the proven benefits of NLP and AI-based search tools, some organizations have been slow to adopt NLP because the building of queries and the extraction of data insights typically require users with a higher level of technical expertise. NLP-based queries can yield excellent results, but many biopharma companies have too few technical experts on staff to quickly address all their users’ search needs. End-users often resort to searching on their own utilizing standard search engines which generally lack domain-specific ontologies and the required matching tools to easily identify causal relationships.

Recent innovations in search technologies, however, are making it easier to empower end users to efficiently and effectively perform searches without assistance from technical experts. For example, some organizations utilize web portals that include assess to powerful queries that are pre-built by their expert users and designed to address specific use cases. Other technologies further simplify the search process by providing user-friendly interfaces and including context around concepts, rather than key words alone.

For example, often safety teams want to search the literature for all mentions of side effects of a class of drugs in order to understand potential mechanism-of-action related effects. Having good dictionaries or ontologies for all the drugs in the class and all the potential adverse event terms is critical for comprehensive search. In addition, it is important to capture the relationships between disease and drug terms (is caused, is associated with, is due to) as well as context (“we predict that AE is caused by drug x”) in order to understand the risk liability landscape for that drug class.   

Making search tools more context-specific is essential for organizations that want to empower end-users to conduct their own searches. Standard search engines rarely produce results that are precise enough to meet the needs of life science users – and search engines that do include life science ontologies typically lack user-friendly interfaces and require assistance from technical experts.

To overcome the productivity-draining inefficiencies of today’s search processes, organizations need technologies that enable end-users to perform their own quality searches. Such tools must be intuitive and easy-to-use and give users the ability to query unstructured text from a broad set of knowledge resources. With enhanced search tools, life science organizations are better-positioned to increase productivity, speed product time to market and improve drug safety.


About the author

Jane Reed is Linguamatics’ head of life science strategy and responsible for developing the strategic vision for Linguamatics’ growing product portfolio and business development in the life science market.