Dataset card

Dataset card is a structured representation of metadata. It is defined in the README.md front matter of a dataset (see folder structure) and provides information about its creator, license, funding, and other relevant information.

Required fields

Property Type Description
name Text The name of the dataset.
description Text Description of the dataset.
license URL The license that applies to the dataset.
keywords Text or URL List of keywords or URLs delimited by commas.
creator Organization or Person The creator(s)/author(s) of the published dataset.
funding Grant A Grant that directly or indirectly provide funding or sponsorship for this item.

Derived datasets

Sometimes a dataset is curated/aggregated and shared by people who were not the original collectors of the data. In those cases it might be unclear how exactly to attribute proper credit to each type of contribution.

In BDM, the creator(s) of a dataset are the people or organizations that published that specific version of the dataset, not the people or organizations that collected the data. For instance, if the shared dataset has been processed inadequately, it would not make sense to blame to original data collectors for such mistakes.

This being said, the original data collectors must be appropriately credited. There are 5 mechanisms we recommend

  • name of the dataset refers to original data collectors;
  • description of the dataset states clearly that data collection was performed by other people;
  • citation provides a link to cite the original study
  • creditText provides credit to both the curated and the original versions
  • isBasedOn provides information and links to the original dataset.
Property Type Description
isBasedOn URL or CreativeWork A resource from which this work is derived or from which it is a modification or adaptation.
citation CreativeWork or Text A citation or reference to another creative work, such as another publication, web page, scholarly article, etc.