3 data quality questions you should be asking, according to the FDA

, , ,
Jim Robbins, Arcadia

3 data quality questions you should be asking, according to the FDA

By Jim Robbins


In December 2021, the FDA issued draft guidance to lifescience researchers on evaluating EHR and claims data in studies to support regulatory decisions. This new draft guidance covers using real-world data (RWD) from these sources to support regulatory decisions on both safety and effectiveness.

Overall, the FDA guidance focuses on three issues related to the use of RWD collected from EHR and claims data:

  • Selection of data sources that appropriately address the study question
  • Development and validation of definitions for study design elements
  • Data provenance and quality throughout the study lifecycle from accrual to the final study-specific dataset

According to the FDA, “This guidance also provides a broader overview of considerations relating to the use of EHR and medical claims data in clinical studies more generally, including studies intended to inform FDA’s evaluation of product effectiveness.”

Why it matters

This recent FDA guidance is a starting point and a promising step towards operationalizing RWD to help patients access innovations faster. As importantly, this new guidance sheds light on the responsibility of data suppliers to address the details around their quality assessment process, including how frequently data quality is assessed and what elements are assessed.

EHR data quality challenges are not new; we see data quality present challenges in clinical research as well as in applications of data in clinical practices. As a result, it is often difficult to reconcile the tensions of real-world imperfections with the necessary regulatory rigor during clinical research.

Impact of the new FDA guidelines for life science researchers

There will likely be two primary effects of the FDA guidance on the industry –standardization and interoperability.

Standardization: While it does not provide standardization, the guidance serves as an opportunity for regulatory bodies, data suppliers and researchers to move towards a standardized definition of data quality. Standardization creates a more objective determination of data that meets the “fit-for-purpose” criteria. A uniform approach where all stakeholders are on the same page provides a “true north” definition with all parties involved, ensuring that decisions are made based on the standard, and will promote fluency across data stakeholders when collaborating with disparate data sets.

Interoperability: Datasets are infinitely more valuable when you combine them. Data interoperability is a keystone within healthcare. It enables decisions based on the ability and ease to combine different types of clinical data. Similarly, RWD is enhanced in depth and breadth by linking various clinical and financial data sources together. This creates a more complete and holistic view for helpful analysis, leading to the best possible decisions. With the growth in tokenization adoption for RWD, there has been great progress towards making datasets more interoperable. However, with multiple tokenization platforms we anticipate further consolidation, which will improve interoperability.

The three most important data quality questions to ask RWD vendors

The new FDA guidance highlights the importance of data quality and how crucial it is to adopt a standards-based vetting process that ensures data suppliers meet your data quality needs. As you collaborate with RWD partners, here are three critical questions to ask, including why these areas are crucial and what to look for in great data partners.

1. How does your quality assessment process handle continuity of coverage challenges?

Based on the FDA guidance, the quality assessment process related to continuing coverage is crucial, as non-connected coverage spans can contribute to the challenge of missing data. As patients migrate across health plans – whether driven by employment and or relocation, capturing the essential aspects of care and outcomes is critical for longitudinality.

It is important to address the quality assessment process in a multi-dimensional way. In addition to the quality of the raw data, additional infrastructure for de-duplicating patients, and transformation of EHR and claims data into a unified view of the patient’s journey is required to fully meet the requirements laid out in the FDA guidance. Bottom line: Good data partners have designed a quality assessment process to create a single identity of patients, regardless of claim type, and focus on integrating multiple sources to combat missingness.

2. How does your quality assessment process handle and support missingness challenges?

EHR data is often challenging due to imperfections in how data is captured in a real-world clinical setting combined with limitations in the current interoperability landscape that exists across various EHR systems. Different EHR systems track and record certain data elements differently or in some cases, not at all. The differences in data collection, terminology across EHR systems and industry wide interoperability challenges can lead to missing data challenges which need to be understood by researchers.

The FDA has proposed leveraging unstructured data and natural language processing (NLP) as a mechanism to manage data missingness within EHR data. To leverage NLP, it is crucial to identify what is findable in the unstructured EHR data, and the availability of unstructured fields across the EHRs being leveraged for a given study. 

Data suppliers with sophisticated experience working with EHR data and advanced data science capabilities will understand where novel applications of NLP will help to minimize missingness, and how these insights can augment more structured data from labs, claims and other linked datasets.

3. How does your quality assessment process support data linking (or interoperability)?

Connecting and combining datasets can drive more value by providing an enhanced picture of the patient journey and health outcomes. Without addressing interoperability, there will inevitably be gaps in a dataset that won’t provide the holistic view crucial to the best possible outcomes.

Based on the FDA guidance, researchers need to consider how interoperability can provide solutions to various data quality challenges. Good data partners will have broadly adopted a tokenization strategy allowing for easy linking across data sets, as market leaders will support multiple tokens to maximize data flexibility.

A step in the right direction

With this new guidance, the FDA is moving in the right direction by ensuring the quality of healthcare data better meets the needs of researchers. It is a step forward that provides enhanced criteria for ensuring your RWD partners are able to support these data quality challenges.  

About the author

Jim Robbins is VP of life sciences at Arcadia, the leading data analytics platform for healthcare and lifesciences. Arcadia’s real-world dataset is built on actively growing EHR and claims data to support clinical research, and the needs of providers, payers, nonprofits, governments and academic institutions.