More and more it’s recognized that data generated in the course of research is just as valuable for the academic discourse as book chapters and articles in journals. For these traditional publication forms standards for citing and referencing have long been established. Standards for the citation and referencing of data are a more recent development.
By citing the dataset you give proper credit to the researcher who developed the dataset. It also makes it easier for other researchers to locate and access the data for replicating, verifying or other forms of academic usage. Furthermore, formal citing makes it possible to track the impact of datasets through publications that cite the dataset.
Elements of citation
Referring to datasets in your publication is in many ways similar to citing traditional resources, with the author(s), year of publication, title and publisher as elements. But they also contain a persistent identifier (PID), a unique code by which the dataset can always be found. PIDs are more persistent than URLs, which often don’t work after a few years (‘link rot’). Below you can read more about the several available standards for PIDs.
When building a data citation at least use the following elements:
- Author: The creator of the dataset. This can be a person, a group or an organization;
- Title: Name of the dataset;
- Date: Year of publication;
- Publisher + Location: Distributor of the dataset;
- PID (only refer to an URL if no PID is available)
- Edition of version: Number that indicates the version or edition;
- Resource type: e.g. dataset or codebook;
- Creation date: Date of creation dataset (important in case of an unpublished dataset);
- Retrieval date: Date of retrieval data;
- Editor: Person or team responsible for editing the dataset.
How the elements of your citation should be combined and styled depends on the style in use for citations in textual publications. Here a few common used styles are being demonstrated (taken from here):APA
Milberger, S. (2002). Evaluation of violence against women with physical disabilities in Michigan, 2000-2001 (ICPSR version) [data file and codebook]. doi:10.3886/ICPSR03414
With optional elements:
Milberger, S. (2002). Evaluation of violence against women with physical disabilities in Michigan, 2000-2001 (ICPSR version) [data file and codebook]. Detroit: Wayne State University [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor]. doi:10.3886/ICPSR03414
Milberger, Sharon. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001. ICPSR version. Inter-university Consortium for Political and Social Research, 2002. Web. 19 May 2011.
With optional elements:
Milberger, Sharon. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001. ICPSR version. Detroit: Wayne State U [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2002. Web. 19 May 2011. doi:10.3886/ICPSR03414
Milberger, Sharon. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001. ICPSR version. Detroit: Wayne State University, 2002. Distributed by Ann Arbor, MI: Inter-University Consortium for Political and Social Research, 2002. doi:10.3886/ICPSR03414.
Milberger, Sharon. 2002. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001. ICPSR version. Detroit: Wayne State University. Distributed by Ann Arbor, MI: Inter-University Consortium for Political and Social Research. doi:10.3886/ICPSR03414.
In the course of time several standards for PIDs have been developed.Handle
CLARIN has chosen to use the Handle system, developed by the Corporation for National Research Initiatives. The Handle System assigns, manages and resolves persistent identifiers for digital objects.
A Handle PID consists of a prefix and a suffix. The prefix is a numerical code indicating the institution or organization that assigned the handle. Organizations that want to become a naming authority (Local Handel Service) can register for a prefix here. The suffix is separated from the prefix with a slash (/) and points to the resource. A handle might look something like hdl:10744/123abc. Here ‘hdl’ indicates that the Handle System is used, ‘1839’ is the prefix indicating the Meertens Institute as naming authority (data centre), and suffix ‘123abc’ is the resource code.
Via resolution systems (such as this) you can resolve individual handles and view their associated values (e.g. location).Other PID standards
Another much used standard is the Digital Object Identifier (DOI). DOIs are also handles, but with additional policies involved (such as specific DataCite metadata) and a specific landing page for the dataset. DOI is a proper service which is used in particular by publishing companies, but its independent business model will not be acceptable for many research organizations. DOIs are registered by DataCite or (in the Netherlands) by DataCite Netherlands. 3TU and DANS are DataCite registration authorities that issue DOIs.
DANS and most libraries use URN:NBN (Uniform Resource Name: National Bibliographic Number). The national resolver for the Netherlands is managed by DANS. The NBN is managed by the KB (more info here).