Yet with big data, one may not know how the data have been gathered. Particularly so, with social media. Kevin Driscoll and Shawn Walker, in Big Data, Big Questions Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data say: “[T]he transformation of mass-scale Twitter data into statistics, tables, charts and graphs is presented with scant explanation of how the data are collected, stored, cleaned, and analyzed, leaving readers unable to assess the appropriateness of the given methodology to the social phenomena they purport to represent.”
Selling the benefit of sharing data to researchers is india rcs data important. AllTrials, an initiative of Bad Science, BMJ, Centre for Evidence-based Medicine, Cochrane Collaboration, James Lind Initiative, PLOS and Sense About Science which calls for all past and present clinical trials to publish their full methods and summary results, highlights that a 2012 audit found that one-fifth of trials registered on clinicaltrials.gov had reported results within one year of completion while other research has shown trials with negative results are twice as likely to be left unpublished than those with positive results. Some science journals have policies that specifically address this issue. For example, PLOS ONE states that “the paper must include an analysis of public data that validates the conclusions so others can reproduce the analysis.”
For example, economics graduate, Thomas Herndon, found errors in the research Growth in a Time of Debt. Had the data and methodology not been available for this student to replicate the research, the error would not have been spotted. Similarly, in 2011, publications from the British Journal of Social Psychology and the Basic and Applied Social Psychology had to be withdrawn when a Dutch researcher was found to have fabricated data. Indeed, if the methodology are not available to allow people to replicate any research, how can the validity of any research be challenged (or reinforced).