Data Publishing: arguably a good thing-but there isn’t that much of it. Why?

“If the data and related metadata collected for impact evaluations was more readily discoverable, searchable, and made available, the world would be a better place. “

With this global statement Markus Goldstein, a researcher of the World Bank, started his blogpost about a better access to impact evaluation data.  For Goldstein, the advantages to access this kind of data are overwhelming:

“It would be easier to replicate studies and, in the process, to expand them by for example: trying other outcome indicators; checking robustness; and looking for heterogeneity effects (e.g. gender). There is also a wealth of other things one could do with the related metadata, including: looking at how different wording of survey questions generates different answers and getting parameters for power calculations. Last but not least, making these data available would allow for a wide range of non-impact evaluation research.”
As reasons why researchers -despite all these great advantages- do not share their data, Goldstein states mainly three major concerns:
  1. The first is that researchers need some return on their investment – they spend a lot of time developing the instruments, negotiating the entire set up of both the survey and the evaluation, acquiring money and so on.
  2. The second reason: making data available is a painfull job he claims, because a lot of variables have to be documented – and all this in a somewhat friendly format. The whole issue get’s even more painfull, if some part of the data considers confidential information, because this takes even more careful attention.
  3. Third, there are no rewards or incentives in the economics profession as a whole for bearing this cost or pain.
But Goldstein is an optimist – he shows some good examples of a functional data archive: the JPAL website and the and the World Bank’s Impact Evaluation Microdata Catalog. And he points out the example of the American Economic Review. This journal has a data policy – and the journal policy is key because it lines up availability and incentives.
What Goldstein likes to do is to get a discussion started on how we might grow these things…
