On Thursday -24th of may- I presented first findings of our research project in a session on research data at the 101st German Bibliothekartag, managed by Goportis, the Leibniz Library Network for Research Information. Up to 100 colleques heard the presentation and some interesting questions were raised.
For introducing my presentation I gave a short overview on the project’s aims and our three project phases. Also the expected outcome and our project partners were introduced.
After the introduction the talk outlined our approach to the work package that is dealing with incentives for researchers to publish “their” data (#wp1). This work package is conducted by our partner of the International Max Plank Research School for Competition and Innovation (IMPRS-CI).
A second approach the talk characterized was our approach to identify hosting options for a publication-related data archive (#wp3). In this work package we are currently analyzing the services of 45 data centres. We wanted to know, whether some of these offer the opportunity to store, host and maintain external datasets – as in our case, datasets provided in the context of a research article. Also we’re trying to evaluate the technical infrastructure and the metadata schemata in use.
The main topic of the talk was the presentation of some results of our work package 2 – data availability policies in scholarly journals (#wp2). On this topic, we achieved a lot of information and results. I started with an outline about what kind of data and descriptions might be crucial for enabling researchers to replicate the work of another. In our opinion these requirements should be claimed by data availability policies of economic journals. Comments on these requirements are welcome!
In the following the talk presented some of the results of the analysis of the data availability policies of 141 economic journals. Some information on the selection may be found in a previous blog post of mine.
In our sample of 141 journals, we were able to find 29 journals with data availability policies – almost 20%. Most of these policies (more than 80%) were mandatory. 90% pledged authors to provide all data prior to the publication of an article. These are good news, because in general there is little hope to receive data from authors after the publication of an article.
9 out of 10 data availability policies required authors to provide the datasets used for the computation of results. Almost two thirds still pledged their authors to provide the programs used (these are normally programs compiled by the researchers themselves, e.g. written in GAUSS, Fortran, C++ etc.) for simulation purposes. The same percentage was received for the requirement that authors have to provide descriptions of the data they provided. Less attention was paid to the submission of the code for computations. Only 50% of the analyzed journals required authors to provide these codes – but without these code replication attempts are very difficult – not to say, that it is often impossible to replicate the results without obtaining these codes.
In economic journals it is very common to provide the research data by attaching it to the article. These data is available in a zip file with a variety of files and formats in it. This is a reason why descriptions of the provided data are crucial.
Simply attaching data is not a sustainable way for making it available to others, for enable others to reuse it or to cite it. In the area of the technical infrastructure in use there is a lot of room for improvements.
In the course of the next weeks I’ll present some of these results more in detail.
The slides of the presentation are available in the download section and are attached below.