Research data: what are the key issues to consider when publishing this kind of material?
What is research data?
There is no hard and fast definition of "research data". That's because research data nowadays often looks very different from one subject area to another.
There is also no concrete definition of specific formats. In the field of life sciences, research data may include measurement data, survey data and observational data as well as audiovisual materials such as images and videos and even the development of software products. Research data arises during the research process and forms the basis for research results.
Publishing research data in parallel to a scholarly article enables other researchers to reproduce and verify the results and conclusions. The same applies to the publication of stand-alone research data.
FAIR Data Principles: what are the key issues to consider?
The FAIR Data Principles are a set of guidelines to make data findable, accessible, interoperable and reusable. These principles provide guidance for research data management and stewardship and are relevant to all stakeholders in the research process. They address data producers and authors alike. The aim is to promote the maximum use of research data.
What do we mean by research data management?
The value of publishing research data relies on the researchers having devoted great care to their work throughout the entire process, from the submission of their proposal to the completion of the project. This creates the basic conditions required for any subsequent use of the data.
The first step is to create a data management plan (DMP).
Data Management Plan (DMP)
This plan defines how the data gathered during a project should be used. It should address the following issues:
- General information on the project and project objectives
- Description of any existing data which it may be possible to re-use
- Description of the data the researchers are hoping to gather, including an estimate of the quantity and format of this data
- Information on how the project participants intend to manage, save and archive the data and how they intend to create metadata
- Information on dealing with administrative and legal aspects such as meeting the requirements specified by funders and upholding data protection guidelines
- Responsibilities and access rights
- Details of the resources which will be required to implement the data management plan
In some sponsorship and funding programmes, data management plans are mandatory. The aim of these DMPs is to ensure the project participants apply good scientific practice by taking the quality assurance steps required to make the data re-usable and fit for publication and that they consider all the legal aspects involved (for example in handling personal data).
The DMP should always take into account the specific circumstances of the relevant discipline and subject area. The research data management checklist created as part of the WissGrid project (in German) offers a good guide on which aspects should be taken into account in a data management plan.
Various software tools have been developed to support the writing process of a DMP. In Germany the tool Research Data Management Organizer (RDMO) which was developed in a DFG project turns out to be accepted. “The Research Data Management Organiser (RDMO) enables institutions as well as researchers to plan and carry out their management of research data. RDMO can assemble all relevant planning information and data management tasks across the whole life cycle of the research data.” (see http://rdmorganiser.github.io/en/).
What are the key issues to consider when publishing research data?
Metadata is a key consideration when it comes to making research data easily traceable and accessible (or “findable”). Metadata describes the content and context of the data based on terms and standards that are specific to the particular discipline. The aim of these is to turn data into “interoperable” data. The application of standards in machine readable – along with the human readable - way is one of the main contributions to the interoperability of research data. Another example is the use of ontologies such as Medical Subject Heading (MeSH) or Multilingual Aagricultural Tthesaurus AGROVOC in indexing.
One option is to assign persistent identifiers such as DOIs to research data. A DOI makes the data citable and offers a citation impact advantage which can boost an author's reputation. Information on the ZB MED DOI service.
When publishing research results in journals, it is important to remember that some journals insist on simultaneous publication of the associated data. Journals that apply this rule include PLOS ONE.
German Medical Science (gms) also offers the opportunity to publish the research data associated with a publication through the research data repository "Dryad".
If, however, you are planning to embark on a patenting process, it may actually be advisable not to publish the associated research data since this may affect the novelty value and lead to the patent application being rejected.
Furthermore you should check if the data contains any business secrets or if grant notifications, employment or service contracts etc express provisions that do not allow to publish the data or that allow a publication only on certain conditions.
Legal aspects and licensing
The legal situation in regard to the use of research data is complex, so we can only provide a few initial pointers here relating to German law. Depending on the nature of the research data, copyright does not generally apply because the data typically does not meet the threshold of originality required by § 2 (2) of the German Copyright Act (abbreviated in German as UrhG). Copyright protection does, however, apply to independently developed software or image and audio materials. And data stored in a database is covered by the neighbouring rights of the maker of the database (§ 87b UrhG). That means that databases are also protected as long as they constitute "a collection of works, data or other independent elements arranged in a systematic or methodical way, the elements of which are individually accessible either by electronic or by other means, and the obtaining, verification or presentation of which requires a qualitatively or quantitatively substantial investment". These neighbouring rights protect the party that has made the investment in preparing the data, though they do not protect the actual compilation of the data. Copyright only applies to database works for 15 years. If no copyright applies, then this may render ineffective any open content license (e.g. a Creative Commons license) granted in connection to the research data. Any analyses, figures, etc. that are produced on the basis of the data are, however, definitely protected. The current Creative Commons licence 4.0 now also covers the licensing of databases.
Federal or state data protection law applies in many instances, particularly in life sciences studies which include personal data. The processing and use of personal data for scientific or any other purposes is only admissible if the data subject has consented, as stipulated in § 4 (1) of the German Federal Data Protection Act (abbreviated in German as BDSG). German law also stipulates that the data must be anonymised in such a way that it can no longer be attributed to any identifiable individuals (§ 3a BDSG). Data protection officers at each institution can provide information on how data must be collected and processed in order to conform to legal requirements. As a general rule it is advisable to ask the relevant department at your institution to review the legal aspects.
In general, licensing the data using an open content license makes it possible to grant tailored rights to subsequent users in order to comply to the “reusable” principle. As with text publications, Creative Commons (CC) licences can also be used for research data. The latest version CC 4.0, with its improved global orientation, is very well suited for licensing research data and is increasingly replacing the older Open Data Commons (ODC) - licenses.
What requirements do funders stipulate?
This section outlines the requirements stipulated by some funders. It does not claim to be exhaustive, so even if an institution does not appear in this list, it may still stipulate certain requirements.
German Research Foundation (DFG)
The DFG encourages the publication of research data and sponsors projects designed to create suitable infrastructures. You can find more information on this subject in the brochure "Information Infrastructures for Research Data" (in German).
In its Guidelines on Safeguarding Good Scientific Practice (German only), the DFG also argues that underlying research data should be kept for 10 years in a way that it is accessible and reproducible. This period starts upon publication. A reduction of this period of time needs to be justified.
DFG also published “Guidelines on the handling of research data” (in German) in which expectations towards applicants with regard to their handling with research data in research projects funded by DFG and support offered by DFG are summarized. With regard to life sciences DFG also published more specific “Guidelines on the Handling of Research Data in Biodiversity Research”.
National Institute of Health (NIH)
The NIH makes annual funding of 500,000 USD or higher conditional upon mandatory publication of the research data. Applications for funding must specify the extent to which the researchers intend to publish the data. Should there be any compelling reasons why data should not be published, this should be indicated on the application form or discussed prior to application with the NIH. The data should be published as soon as the publisher approves publication of the scientific article that contains the key scientific findings. More information: "NIH Grants Policy Statement".
European Commission
In the Horizon 2020 research funding program the European Commission has initiated a pilot project which focuses on open access publishing of research data (Open Research Data Pilot) and expressly calls for the compilation of data management plans and research data to be published. While the pilot project only covered several scientific domains of the Working Program for 2016 the conditions now apply to all thematic domains; Those projects that do not want or are not able to publish their data e.g. projects that do not generate any data or if results are intended to be protected as intellectual property (for example by a patent application) have to reason their case. There is no need to publish all that emanating from the project. The "Guidelines FAIR on Data Management in Horizon 2020" provide information on the pilot project and on drawing up data management plans which is compulsory for projects taking part in the data pilot. The data management plans in these projects needs to be written within six months after commencement of the project. Thus, data management plans are not considered for project review.
Finding a suitable research data repository
Some institutions in Germany have a research data policy and actively support their members in publishing research data. This support often includes the provision of the corresponding infrastructure, such as a research data repository.
A number of directories and search services are available to help find a suitable repository for depositing research data. One example is re3data.org – the Registry of Research Data Repositories. This catalogue currently lists more than 1,000 repositories worldwide for the field of life sciences alone, 120 of which are in Germany.
Dryad is another alternative for data from the realm of life sciences. The Dryad repository acts as a storage location for the data underlying scientific publications. There is no charge for downloading and using the data which is made available in Dryad under a Creative Commons license, but data submitters are charged a fee for each data package. Dryad enters into cooperation agreements with institutions, journals, universities, professional associations, etc. which state that these organisations will cover all or part of the cost of data publishing charges on behalf of their community of researchers. But even members of institutions which have not signed a cooperation agreement with Dryad can still make their data packages openly accessible on Dryad by paying the data publishing charge. The submission fees are used to pay for the infrastructure and support the long-term curation and preservation of the data. Data deposited in Dryad is assigned a digital object identifier (DOI), a permanent identifier that ensures the data remains accessible and citable. The question of whether the data should only be published after an embargo period can be settled during the submission process. ZB MED co-operates with Dryad within the framework of German Medical Science (GMS).
ZENODO is a repository covering all fields of science which accepts research data in addition to academic publications. ZENODO enables users to run targeted searches for data sets, videos and images, and software. The repository is managed and developed by CERN in Geneva. It was set up using EU funds and forms part of a Europe-wide open access infrastructure.
Alternatives to research data repositories
More and more publishers are recognising the importance of research data and starting to offer authors the option of publishing their research data in a supplementary capacity. Some journals even stipulate this as a requirement. In addition, there are a number of "data journals" which specialise in publishing research data. The "Data Journals" wiki contains an incomplete list which may be useful as a starting point.
The issues surrounding "big data" have also led to the regular launch of new journals which focus on new methods of meeting big data challenges and provide details of suitable publishing options.
See also
our webpages dealing with research data management
Disclaimer
Important note: The information and links provided here do not represent any form of binding legal advice. They are solely intended to provide an initial basis to help get you on the right track. ZB MED – Information Centre for Life Sciences has carefully checked the information included in the list of FAQs. However, we are unable to accept any liability whatsoever for any errors it may contain. Unless indicated otherwise, any statements concerning individual statutory norms or regulations refer to German law (FAQ updated 11/2017).
Contact
Dr. Jasmin Schmitz
Head of Publication Advisory Services
Phone: +49 (0)221 478-32795
Send mail
Birte Lindstädt
Head of Research Data Management
Phone: +49 (0)221 478-97803
Send mail
Related links
WissGrid checklist (in German)
DOI service at ZB MED
Open Data Commons
PLOS ONE – Publication Criteria
GMS
on legal aspects
Hoeren, Thomas: Discussion of Internet law (in German)
Information on rights to research data and databases (in German)
Booklet on copyright issues in science (in German)
on requirements stipulated by funders
- DFG: Funding Programme: "Information Infrastructures for Research Data" (in German)
- DFG: "Safeguarding Good Scientific Practice“ (German only)
- DFG: “Guidelines on the handling of research data” (in German)
- DFG: “Guidelines on the Handling of Research Data in Biodiversity Research” (in German)
- NIH Grants Policy Statement
- Guidelines on FAIR Data Management in Horizon 2020
on research data repositories and alternatives
re3data
Dryad
GMS
ZENODO
"Data Journals" wiki
Further information
ZB MED blog entry: "Electronic Lab Notebooks als Teil des Forschungsdatenmanagements"
Lauscher, M. & Vandendorpe, J. (2024): Why doing research data management? (accessed 07/03/2024)