Researchers Not Making Raw Study Data Available

Medical journals may urge researchers to make raw study data available publicly, but many researchers are still not doing so, according to recent study in PLoS One, which reviewed 351 paper from 50 medical journals that have the highest impact. The review found a wide variation in policies for sharing data and how researchers follow these policies.

And as Nature notes, the results come amid a big push to make such study data available, not only to facilitate additional research, but also mitigate fraud and error. To cope, the study authors suggest more journals should adopt specific data-sharing policies and procedures are needed to ensure that existing policies are consistently followed by researchers and published findings are easily reproducible.

As to the findings, the study found that 22 of the 50 journals required public sharing of specific raw data as a condition of publication, and another 22 encourage data sharing without any binding instructions. The remaining six journals offered no instruction. The researchers then examined the first 10 papers published in each of the 50 journals in 2009.

They found that 149 were not subject to any data-sharing policy. Of the remaining 351 papers, 208, or 59 percent, did not fully adhere to data availability instructions; most common was not publicly depositing microarray data. The other 143 papers that did follow data availability instructions did so by publicly depositing only the specific data type as required, making a statement of willingness to share, or actually sharing all the primary data.

"The current state is not optimal," John Ioannidis, the lead author and an expert in data reproducibility at Stanford University School of Medicine in California, told Nature. "Some journals have pretty good policies and some of the papers adhere to these, but there is plenty of room for improvement".

The study also found that researchers rarely volunteer data. Of the 500 papers examined, only 47 had their full primary data sets - as opposed to just the raw data requested by the journals - publicly available. None of the papers published in journals without data-sharing policies deposited their full set of raw data online, Nature adds.

Why is there such reluctance? Journal editors may not want to insist on introducing or enforcing data-sharing policies if this would discourage submissions (here is the PLoS study).

pic thx to rdecom on flckr

9 Comments

Sep 28, 2011 - 1:23pm

Even if a researcher provided a journal with the hundreds of pages of data listings of raw data, is that journal going to now hire 1) a programmer at $50/hour to program the data so that 2) the $125,000/year biostatistician can then be brought in to wtite a statistical plan, then turn the listings in to SAS data sets? And then will that same statistician do the inferential analyses on the journal's dime? I highly doubt it. Instead, the journal will do its own post hoc analyses on a handful of selected datasets without regard for the dataset as a whole. Such post hoc analyses are no more valid than if the investigator did them.

Analysis of datasets is a comprehensive process; you don't do it piecemeal. I also don't know of any journal wiilling to spend the $20,000-$50,000 on programmers and statisticians to do the job properly.

Sep 28, 2011 - 1:43pm

oii:

It would seem that the expectation was a 'specific' subset which might modify the task - but as we used to observe 'there are subsets and there are SUBSETS.'

"the study found that 22 of the 50 journals required public sharing of specific raw data as a condition of publication"

Sep 28, 2011 - 2:28pm

Searching, I still think this is a sticky wicket. As you know there are several accepted ways to classify subjects as outliers for purpose of efficacy analysis. What if the PI chooses one method and the journal chooses another method, and the methodological differences influence whether the data are startistically significant?

Do we flip a coin? This is just a tiny example of how valid methodological differnces can infouence results. For example, what if I choose to do an ANOVA and the journal does an ANCOVA with different results. Who is right?

I would prefer to just give them the entire data set, and then chellenge the journal to do their own analyses, and then we can discuss the results.

Sep 28, 2011 - 2:59pm

Isn't one of the critical issues here the way the companies cherry pick data showing lethal or potentially lethal side effects from the drugs that turn up in the trials? A good example of this is how Lilly and Astra Zeneca hid the fact that trial subjects committed suicide on Zyprexa and Seroquel, respectively. Later, once the drugs came to market, Dr. David Healy asked each company for this information. Astra Zeneca came forward, but Lilly did not. Even today, there is no warning on the label of either drug for the risk of the drug causing akithisia or suicide. Yet it is a known side effect of the drugs.

Sep 28, 2011 - 6:36pm

Rhea, in part those differences with respect to side effects have to do with the way the companies chose to code their adverse events using standard coding dictionaries. You say that companies "hid" data. These companies would argue "classification bias" rather than "hid" which is a lawyer's way of saying the same thing, but did not keep them out of trouble anyway. Remember that when these studies were in preogress there were several coding systems in use, some with conflicting terms. This has now been replaced by the standard MedDRA dictionary, which should reduce these errors in the future.

Sep 28, 2011 - 7:32pm

Rhea, just to correct the record, the Zyprexa label does state that a patient attempted suicide during the clinical trials, and that the incidence of akathisia was about 5%. It has contained this information at least since 2003, which is the earliest version of the label available on the FDA website.

Suicide is very common among schizophrenics, commonly estimated at 5 - 10% lifetime risk. In a search of pubmed, I did not find any studies suggesting that Zyprexa increases this risk. I did however find two papers suggesting that it reduces the risk, one of which was performed by the national health ministry in Finland, and can presumably be assumed not to be Lilly-sponsored.

http://www.ncbi.nlm.nih.gov/pubmed/14760515 http://www.ncbi.nlm.nih.gov/pubmed/18327869

Sep 28, 2011 - 7:43pm

Also, the occurrence of suicide attempts in the Seroquel clincal trial program were reported to the FDA PRE-marketing, as indicated in the review document here.

http://www.accessdata.fda.gov/drugsatfda_docs/nda/97/020639ap_Seroquel_medrP1.pdf

This has now been replaced by the standard MedDRA dictionary, which should reduce these errors in the future.

Sep 29, 2011 - 5:59am

Ideally public sharing of data would be a pure positive, allowing independent confirmation of key conclusions and the exploration of key hypotheses. But I can certainly understand why companies would be reluctant to have third parties performing their own alternative analyses of data from studies on commercial products.

The recent Singh meta analysis on Chantix would be a good example of how this can become a PR nightmare. Even though the statistical analysis presented by Singh was roundly criticized in the comments page of the CMAJ, science by press release lead to headlines in the popular press stating that Chantix would cause one in 28 patients to have a heart attack. Once a headline like that gets out there, its pretty tough to put the genie back in the bottle with a lot of technical arguments about flawed statistical analysis and inappropriate comparator groups.