Key Questions to Ask When Evaluating RWE for New Medical Treatments

By Nancy Dreyer, chief scientific officer, IQVIA Real World Solutions

For more than 50 years, randomized controlled trials (RCTs) have been used by the biopharmaceutical industry and regulators to evaluate the efficacy and safety of new medications as well as medical products that are being repurposed for new uses. But regulators and biopharmaceutical companies are being pushed to modernize their approaches to expedite drug development, which has resulted in increased attention to potential contributions of real world evidence (RWE). Rather than debating the hierarchy of evidence and preferability of one study design over another, it has become apparent that both randomized and non-randomized studies can provide meaningful, complementary information about the effectiveness and safety of new medical products. We see RWE from nonrandomized studies being used to complement RCT, such as providing context for single-arm trials through external comparator arms and less formal benchmark data. 

To set the groundwork, remember that RCTs are necessarily homogenous by design. They generally are conducted in optimal medical care settings with highly skilled clinicians and extensive inclusion and exclusion criteria to assure a high comparable group after randomization. They have strong internal validity, but the generalizability to patients who are not eligible for the trials is weak since all subgroups of interest are rarely studied in RCTs. Although this could be addressed by much larger trials, ones that are powered to support subgroup analyses, those mega-trials are very expensive and take a long time to complete.  

In contrast, RWE is derived from studying treatments as they are being used in real-world settings – by diverse patients and clinicians in a variety of medical care settings. Randomization and treatment blinding are rarely used.

With the recent proliferation of structured healthcare databases like health insurance claims, electronic health records (EHRs), patient registries, and person-generated health data, there is more opportunity than ever to leverage nonrandomized RWE as part of clinical development and post-launch. However, the diversity and variability of these different real world data (RWD) sources require careful, fit-for-purpose considerations for design and analysis in the context of the specific issue at hand.   

For every non-randomized study, the study design and analytic methods should consider various sources of systematic error, also referred to as “bias”. For example, when treatments are not assigned at random, there will be a variety of reasons why a clinician might offer one patient a particular treatment, and a similar patient, something different. These conscious or unintentional uses of different treatments could also be related to other factors that could affect prognosis. For example, high out-of-pocket costs for a treatment may lead low-income patients systematically to choose a less costly treatment over a more expensive one, or treatments that require presentation in person, such as infusions, may be unpalatable to those who are very sick or unable to drive because of various medical limitations. Failing to account for these selection factors by design and/or analyses may lead to confounding bias.

These five key considerations will help guide your thinking with respect to design, analysis, and interpretation of nonrandomized data.

1) Does the design emulate a hypothetical randomized trial design?

Nonrandomized studies should be designed using a target trial framework to mimic the design of a RCT. The target trial can be purely hypothetical, but this framework acts as a guide in the design process and can help avoid major design flaws. For example, a target trial framework requires thinking through the initiation time point for the study, which then helps determine the subsequent timing of treatment decisions, comparable to the time of randomization in a RCT. Having a clear inception point for attributing the timing of the treatment decision also helps to mitigate errors associated with the timing for measuring inclusion and exclusion requirements, etc.

2) Is the comparator or control condition appropriate?

Since non-randomized studies don’t use placebos, the selection of a comparator arm is typically driven by the research question and consideration of treatments currently used, if there are any, for the indication(s) of interest. The “new user” design is a popular methodology used to achieve comparability between treatment groups so that the experience of people who stop a new treatment quickly, e.g., due to a safety issue or lack of effectiveness, is not lost from study. People who might otherwise qualify for treatment but are not being treated (“non-users”) are rarely used as comparators since patients who decline or cannot afford treatments are often very different from those actively being treated, e.g., different disease severity, comorbidities, and health insurance coverage. If a non-user comparator group needs to be used, researchers will be expected to provide strong justification and use extreme caution when measuring and attempting to adjust for confounding factors.

3) Does the primary analysis account for measured confounders?

There are a variety of techniques that are used to make statistical adjustments to account for differences between treatment groups, including multivariable regression and propensity score methods. These approaches are used to account for differing characteristics between groups being compared. For example, propensity scores are used to account for the likelihood that a patient would receive a given treatment, allowing more accurate inferences from comparative studies. No matter what statistical approaches are used, it is important to plan for how confounders will be handled analytically in order to get the most unbiased effect estimates possible.   

4) Have sensitivity analyses been used to quantify the potential impact of residual confounding?

Even when all strategies are employed to account for potential bias, there may still be some bias from residual confounding. Sensitivity analysis is a popular, reliable technique that is used to quantify how much bias could have affected the overall effect estimates. Different assumptions are examined analytically to show how much they could have affected the results. For example, if all the people who didn’t report whether they smoked were actually smokers, how much would that change the results, if at all?  Sensitivity analyses have been found to be one of the strongest indicators of quality in real-world studies of comparative effectiveness (see the GRACE Checklist at www.graceprinciples.org). Other approaches can help increase confidence in findings, such as assessing the effects of treatment on “control” outcomes that share a similar confounding mechanism as the outcome of interest.

5) Are the research methods open to inspection and replication?

Philosophers of science have proposed that science advances through conjecture and refutation. As Karl Popper wrote, “a theory which is not refutable by any conceivable event is non-scientific.”  The application to RWE is that transparency of methods is essential to allowing others to replicate (or refute) the findings of a particular study. While there is no unanimity on what good practice looks like with regard to inspection and replication, many suggest that registration of nonrandomized study protocols prior to the beginning of the study will enhance transparency and can mitigate concerns about data dredging, as well as help establish the groundwork for replication in future nonrandomized research.  Other efforts are underway by the FDA to require more systematic information about data provenance, a term used to describe the data source, its characteristics, and the population covered by the data source. There is also continued interest in making data available for re-analysis and replication through secure data portals.

Nancy Dreyer

Where do we stand now?

Non-randomized studies that cannot respond adequately to these questions are unlikely to be considered reliable in any forum – regulatory or otherwise. Improvements in data creation and accessibility, as well as growing a broader understanding of study methods for non-randomized studies, are leading to increasing reliance on RWE for assessments of medication effectiveness and safety. RWE is being used now by regulators, clinicians, payers, and patients to evaluate real world benefits and risks associated with many treatments, including information about outcomes in populations of special interest not represented in the RCT. As focus on diversity and equitable health outcomes grow, these studies will support evaluation of treatment outcomes in more varied populations, providing information that is critical to guide evidence-based decision making.

Understanding when and how nonrandomized studies can generate reliable inferences about the benefits and risks of medications has greatly improved over the last few decades. Distinguishing a high-quality from low-quality study is a complex challenge that cannot be solved with a simple checklist. However, the questions and recommendations described here will improve the quality, utility, and acceptance of nonrandomized real world research.