What’s complicating good data practices and data integrity?

, , , ,
Dan Ayala, Dotmatics

What’s complicating good data practices and data integrity?

By Dan Ayala, Dotmatics

Data integrity is an ongoing concern across all R&D organizations, no matter what part of the research lifecycle they’re navigating. These concerns extend beyond the potential for delayed timelines or cost overruns. Instead, it’s about something bigger: establishing a culture of quality; ensuring product efficacy and patient safety; and being a trusted brand, partner, or provider.

Prioritizing data integrity in the lab

Good data practices throughout the R&D process can positively impact data integrity in the lab. Companies must be able to defend the fidelity and confidentiality of all records and data generated throughout a product’s entire lifecycle, starting with the earliest points in research, including raw data, metadata, and transformed data. To do this, companies must have the right processes and technologies in place to ensure proper:

  • Data integrity: How is the completeness, consistency, validity, and accuracy of data impacted by the way it is produced, captured, quality checked, transformed, and traced?
  • Data governance: How does the company manage and track who has access to what data, via what means, how it is used, and to what degree?
  • Data security: How is data encrypted, transferred, stored, and backed up?

These factors — each challenging in their own right — are all intertwined, adding to the complexity of upholding good data practices in the modern lab.

A shifting data management landscape

As R&D organizations digitize their data to make analytics at scale possible, best practices for data management must also evolve. Teams must have clear strategies for identifying and mitigating threats to data integrity, including technological, managerial, and external risks. This is no small task. In fact, in the realm of Pharmaceuticals, the U.S. Food and Drug Administration (FDA) reports increasing data integrity violations in recent years.

Data integrity is at risk in many cases because the complexity of R&D data, processes, and technologies present numerous opportunities for good data practices to go awry. The most common type of warnings and violations cited by the FDA include data loss; missing metadata; non-contemporaneous collection or backdating; data deletion and copying; sample elimination or reprocessing; poorly investigated out-of-specification results; data access and security issues; and inadequate or disabled audit trails. Missteps like these at any point in the R&D process can impact the overall research validity.

Data integrity and security breaches could potentially lead to incorrect or non-recreatable research results, raise implications on patient safety and product efficacy, or generate violations that might cause a drug to be rejected at submission or pulled from the market later.

“Multimodal R&D”

Companies hoping to drive innovation are diversifying their R&D efforts and working across different areas of science with novel modalities. As a result, data are pouring from wide-ranging sources via different means and in different formats. An organization or institution may have several different internal research groups collecting data from thousands of pieces of specialty equipment or instruments; in parallel, it could also be undertaking complex post-acquisition or legacy-data migration activities, all while working with multiple external CROs who have their own distinct systems and processes.

All of these different data come from teams that work not only across different modalities and specialty areas of science, but also across different locations globally, each with its own compliance standards and regulations. This incredible volume and diversity of multimodal R&D data create lab integration and data management challenges that can risk compromising data integrity and security. Many companies are struggling to keep pace with a vast volume of diverse data and metadata needed to inform decision making throughout the R&D process.


Ensuring the success of R&D at scale means improving data flow between research groups so they can build off of their collective knowledge. The importance of data sharing in advancing science was recently underscored by the United States National Institutes for Health (NIH), which established new 2023 data management and sharing policies to confirm findings, encourage reuse, and spur innovation.

Whether it’s chemists and biologists collaborating on chemically modified biologics, or internal and external partners working on projects across modalities and diseases, teamwork is more important than ever; unfortunately, it’s not always easy. Many R&D groups, who have long worked in relative isolation, are now required to collaborate and share data, which requires shifts in mindset and culture. It also requires a governance and execution shift. Bespoke and insulated research teams don’t have the systems and processes in place to share and hand off well-annotated data while at the same time controlling access, tracking changes, and ensuring good data practices are followed by all participants and collaborators.

For many companies, it’s hard to facilitate efficient and secure data sharing that doesn’t compromise data integrity. Even the most erudite collaborators have approaches to interaction with instruments, software, workflows, and data types that don’t align with each other. This complicates collaboration. Structured and unstructured data end up scattered in multiple repositories and across different mediums rather than within a secure, centralized, standardized data pool that appropriate collaborators can access and that leverages a well-defined data governance framework.

Data sharing challenges are growing so common that they’ve prompted calls to establish better data management standards. One well-known example is the FAIR guiding principles for scientific data management, which promote the adoption of technology and processes that make all data findable, accessible, interoperable, and reusable by both humans and machines alike. Becoming FAIR complaint requires changes in format, model, and storage of data, as well the ways that instruments, software, and systems are integrated. While this can seem overwhelming, the change can be done incrementally; it’s not an all-or-nothing proposition. Whether a company is building a comprehensive FAIR-compliant informatics ecosystem or adopting a data analysis and graphing solution that embraces FAIR data principles, moves toward implementing FAIR-aligned methods can pay dividends in time savings, reproducibility of research, improved knowledge sharing, and AI-readiness.

Artificial intelligence

As AI arrives in R&D, organizations and institutions will need data infrastructures to capture and manage the proprietary data that will differentiate their research in an AI-everywhere world. For many universities and health companies, becoming AI-ready means first adopting technology and process changes to support exponential growth in data volumes, elimination of data silos, integration of bespoke software and systems, and normalization of data.

The ultimate goal is that any data created and captured throughout the R&D process will be trustworthy, well-structured, correlated, shareable, and model-ready. While achieving these aligned data standards is uniquely challenging in scientific R&D because of the complexity of the workflows, data types, software, and systems, it is, nonetheless, essential. Global compliance regulations are currently being updated to guide the use of AI and ML in medical and general research.

In March 2024, the EU passed an overarching Artificial Intelligence Act. This landmark law aims to protect human health, safety, and fundamental rights as AI is increasingly relied upon for innovation across a broad spectrum of industries, academia, government, and civil organizations. Now is the time for companies to ensure that their existing systems and processes support the regulatory and ethical challenges of using AI in research, including assurance of data integrity, security, traceability, and bias limitation.

Good data practices

Alignment of data management and integrity are vital to long-term research success and preparation for the automated, connected, and collaborative future of research. Fortunately, today’s scientists have a wide range of tools to easily manage, search, and visualize their R&D data, with the future being led by solutions that can unite all those applications that produce and analyze data within one secure data-management platform.

Dan Ayala is chief security & trust officer at Dotmatics.