Question: What share of papers are demonstrably backed by raw data?
Takeaway: In a survey of 41 papers with questionable results, 97% of the authors were unable or unwilling to produce raw data to verify their findings. This suggests a lack of raw data — and so potential incompetence, fabrication, or fraud — could contribute to the low quality and poor replicability of life sciences research.
In this 2020 editorial, Tsuyoshi Miyakawa, editor-in-chief of the journal Molecular Brain, finds evidence to substantiate the concern that a significant share of published manuscripts are not supported by raw data and so may be fraudulent or in error.
The reproducibility crisis has been recorded across multiple fields, including cancer research, where less than one-fourth of published papers successfully replicate when tested (1), and psychology, where slightly more than one-fourth of published papers replicate (2). These issues have been attributed to consistent study design issues, p-hacking, selective reporting, and a variety of other issues (3). Miyukawa argues a lack of raw data is a major contributor to the crisis, as the reported results of a study cannot be validated — and so may be fraudulent or in error — unless backed by raw data (4).
Miyukawa handled 180 manuscripts as editor-in-chief of Molecular Brain between early 2017 and September 2019. Forty-one of these papers were flagged for revision, generally because Miyukawa believed the data were “too good to be true.” The authors of these 41 cases were asked to supply the raw data to verify their figures. Twenty-one of the 41 paper authors simply withdrew their submissions without providing the data. Of the 20 who did provide raw data, 19 provided data that was either insufficient to support the results described in the paper or which did not reconcile with these results. Two cases had evidence of image duplication/modification. In sum, only one in 41 papers for which raw data was requested (3%) provided satisfactory raw data in response. Fourteen of the 40 rejected papers were later published in other journals, and still, none of the authors were able to provide sufficient raw data upon request.
Miyukawa argues there are some rare cases in which authors may be unwilling or unable to produce raw data to support their results. In some cases, the raw data may reveal proprietary findings. In some cases, the volume of the data may be sufficiently large that it is challenging to gather and distribute. In other cases, the authors may wish to mine the same data for later publications. None of these exceptions could reasonably be applied to the 41 papers described in this case.
Surveys have suggested data fabrication may be a substantial issue in the life sciences. As a follow-up, Miyukawa surveyed 227 colleagues in the life sciences, 53% of whom believed more than two-thirds of withdrawn manuscripts may have fabricated at least some of the data. A previous survey found 14% of life sciences researchers admitted their colleagues had falsified or fabricated data (5). Evidence of deliberate image manipulation is less common — 2 to 4 percent — but can be obscured (6). Among those surveyed about the causes of the reproducibility crisis, 40% believed a lack of raw data and/or fraud were significant contributors (7).
Miyukawa notes the academic culture is one of trust, and accusations of fraud are only levied after all alternative explanations have been found insufficient. These results, however, indicate this culture may be outdated, and that this trust-based system has allowed widespread fraud and deception. In response, Miyukawa changed the editorial policy of Molecular Brain, requiring that all accepted manuscripts have raw data submitted prior to publication, barring specific ethical or legal exceptions. This is consistent with the attitudes of authors such as Glenn Begley and John Ioannidis, who suggest the explicit requirement for raw data will improve the validity and reproducibility of scientific research (8).
Notes
- Believe it or not: How much can we rely on published data on potential drug targets?; Drug development: Raise standards for preclinical cancer research
- Open Science Collaboration: Estimating the reproducibility of psychological science
- HARKing: Hypothesizing after the results are known; The extent and consequences of p-hacking in science; Systematic review of the empirical evidence of study publication bias and outcome reporting bias; Evaluation of excess significance bias in animal studies of neurological diseases; Why most clinical research is not useful
- Scientists lift the lid on reproducibility
- How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data
- The prevalence of inappropriate image duplication in biomedical research publications
- Ibid.
- Recommendations for increasing replicability in psychology; Reproducibility in science. Improving the standard for basic and preclinical research
No Raw Data, No Science: Another Possible Source of the Reproducibility Crisis