Centre for Data Archiving and Dissemination CAD

The CAD will be hosted by one or several established national social science data archive(s). The proposed host for the CAD is GESIS – Leibniz Institute for Social Sciences.

While all of the MEDem data may not be physically stored at a single location, the MEDem CAD will provide a single data access point for MEDem-related data. The main tasks of the CAD are to:

  • Make data usable across projects, and datasets collected in the different projects, in collaboration with the other competence centres;

  • Provide tools and instruments to link the data sources from different projects;

  • Map the existing projects and data sources, and make data findable;

  • Ensure that all the data is well documented and updated following a joint metadata and documentation standard and archived in established data archives, if not archived at the competence centre itself;

  • Archive a substantial part of the data collection and provide links to data archived elsewhere;

  • Contribute to developing post- and pre-harmonization procedures and standards for previous and future data collections jointly with the Centre for Methods and Standards;

  • Advise the other centres, in close collaboration with the Centre for Methods and Standards, in their data harmonization and in the development of standards for data collection, data integration and data documentation;

  • Support and train researchers in the use of MEDem data.


The proposed specific services for the CAD, on the basis of FAIR Data principles and the concept of a distributed infrastructure, are the following:


  • Data Access and Dissemination

The CAD will develop an integrated database, which offers data user a central point of access for MEDem-related data. This involves the development and maintenance of a dynamic online interface that allows data users to interactively search for data and select subsets of the data by topic, time points, and geographical coverage. In addition, users can search for comparable variables across different studies. Entire studies, documentations or variables can be downloaded either directly or by redirecting users to external repositories. This will include a service for making dynamically generated data subsets reproducible and citable. Long-term archiving of the data will be provided through the established services of GESIS DAS.

The data need to be archived in places where persistent identifiers are offered, but they do not need to be in one specific repository. The challenge is to identify all data that need to be included, and coordinate with the repositories where they are located in order to add the correct metadata that make sense for the community. Once this is done, metadata will be automatically harvested. Electoral data often contain sensitive information that has to be removed from the use files or made anonymous for data protection reasons (e.g. regional data). In cases where this is not possible or where there is a legitimate interest to take the variables into account in the analysis, we can distribute the corresponding datasets via the Secure Data Center (SDC) of GESIS.


  • Data Harmonization and Visualization

An important task of the CAD will be the post-harmonization of MEDem-related data across sources/measures, space, and time. GESIS will develop and maintain a harmonization interface which allows user to easily harmonizes variables from various studies and generate the associated code. Many studies have already been harmonized to be linked together in order to produce workable datasets for publications. These harmonization codes could be used again for new research projects. Thus the harmonization interface will be accompanied by an online library of documented harmonization’s efforts of existing data, where researchers can follow the logic of already used harmonization processes and download the code.

When using the aforementioned across-source harmonization approach, it becomes relevant to obtain easier access to the data regardless of its physical location. Ideally, data can be linked without having to manually download the data from their original repository. This becomes possible by the development of open APIs to offer standardized machine access to the participating repositories. Having such access options and online-accessible harmonization routines in place will for example allow the easy production of high-level visualizations from the harmonized data.


  • Data Linkage

In addition to data harmonization and integration, a major task will be the linkage of elite and mass surveys to contextual characteristics. The CAD intends to integrate the corresponding contextual data in the integrated database, where user can either download ready-linked data sets via the online interface or generate customized datasets with specific contextual variables. It should be noted that these linked data sets may also be sensitive if it is possible to identify the survey respondents. However, this data can then also be offered via the SDC.


  • Metadata and Documentation Standards

All the above technical developments are only effective if the data in question are very well described with machine-readable, structured metadata that make sense to the community. Therefore significant efforts will be dedicated to the development of a joint metadata standard and controlled vocabularies that enable the interoperability of all electoral democracy data and the correct functioning of the above tools. This also involves standardization of the content and layout of the data documentation.