As mentioned in some of my previous blogposts we analyzed more than 140 economic scholarly journals regarding their data availability policies. It has been an interesting work checking the quality and the extend of these data policies.
In our sample (that was evaluated in 2011 by the German Economics Bräuninger, Haucap and Muck regarding the reputation and relevance of these journals) we were able to find 29 journals equipped with a data availability policy and 11 journals that own a so called “replication policy”.
The quality and extent of the data availability policies in our sample differed massively: some were just a few sentences long, others comprise several printed pages. But the extend of a policy is not necessarily a proof of a good quality. We discovered good examples that are no longer than one-third of a page.
Regarding the question, what kinds of data and materials authors have to provide beside their manuscripts, we analyzed the data availability policies only. The reason is simple: Replication policies are pledging their authors to provide “sufficient data and other materials” on request only, so there are no files that authors have to provide to the journal. Sounds good in theory. Doesn’t work in practise ;-). If anyone wants to know more about the reasons why these replication policies do not work, I suggest reading the paper “Replication in Empirical Economics: The Journal of Money, Credit and Banking Project” written by Dewald, Thursby and Anderson in 1986.
For obtaining some impressions about the question what data and files have to be submitted to the editors of scholarly journals equipped with a data availability policy, we analyzed the specifications of each data availability policy in our sample along a set of requirements for data policies .
First of all we have to state that more than 83% of the data policies are mandatory. It seems that this is a good percentage because it is important that these policies are mandatory. If they’re not, there is little hope that authors are providing a reasonable amount of datasets and other data – simply because it is time-consuming to prepare the datasets for others. Besides authors often do not want to publish a dataset that is not fully exploited – and provide it for to potential competitors in their respective research disciplines.
We found out that 26 of the 29 policies (89.7%) pledged authors to submit datasets used for the computation of their results. The remaining journals do not pledge their authors to do so, because the journal’s focus is more oriented towards experimental economic research. Therefore I think that almost 90% is a good results in my opinion.
Regarding the question what kinds of data authors have to submit, we found out that 65.5 % the journals’ data policies are pledging their authors to provide descriptions of the data submitted and instructions how to use the single files within a zip-container. The quality of these descriptions differs from very detailed instructions to a few sentences only that might not really help would-be replicators. For the purpose of replication these descriptions of the data submitted is very important due to the structure of the data authors are providing: In most cases you will be able to download the data as a zip-file only. In this zip-container you’ll find a broad bunch of different formats and files. If you did not generate the data, it is extremely time-consuming to find out what part of the data corresponds to what result in an economic paper, if this works at all. Therefore it is not sufficient that only 65.5% of the data policies in our sample mandate their authors to provide these descriptions.
The submission of (self written) programs used e.g. for simulation purposes are mandatory for 62% of the policies. This of course also is problematic aspect: If another researcher wants to replicate the results of a simulation he or she won’t have the chance to do so, if the programs used for these simulations are not available. Without these programs you might trust the results claimed in an economic paper or you don’t. But there’s no way to verify the results. Therefore in our opinion it also is important for journals and their data policies to pledge authors to provide the programs used for simulation purposes, too. Of course it is in the focus of any journal, whether it publishes these kinds of papers – but if they do, they should take care of the fact that researchers are uploading the programs and source code.
Only half of the policies mandated authors to provide the code of their calculations. Due to the importance of code for replication purposes this percentage may be considered as low. The code of computation is crucial for the possibility to replicate the findings of an empirical article. If authors do not provide the code it often is not possible to replicate the results. Without the code would-be replicators have to code everything from scratch. And whether they will be able to receive the similar code of computation is very uncertain. And if the code of computation is different you won’t be able to get the results claimed in the article. Therefore it is crucial that data availability policies enforce strict availability of the code.
In summary, it can be stated that the management of publication related research data in economics is still at its early stages. We were able to find 29 journals with data availability policies. That is much more than McCullough (2009) found three years ago. In the field of economics, editors and journals seem to be in motion. This is a positive signal and it will be interesting so see whether and how this upward growth continues.
Among the 29 journals with data availability policies we noticed that 10 out of these 29 used the data availability policy implemented by the American Economic Review (AER) at first. These journals either used exactly the same policy or a slightly modified version of it.
Also, that a large portion of the analyzed data availability policies are mandatory is a good practice and may be observed as a sign that editors and journals consider the availability of research data to be important. Moreover the finding that 90% of the journals are pledging their authors to submit the data prior to the publication of an article shows that many of them have understood the importance of providing data at an early stage in the publication process.