Dataset card
Dataset card is a structured representation of metadata. It is defined in the README.md
front matter of a dataset (see folder structure) and provides information about its creator, license, funding, and other relevant information.
Required fields
Property | Type | Description |
---|---|---|
name | Text | The name of the dataset. |
description | Text | Description of the dataset. |
license | URL | The license that applies to the dataset. |
keywords | Text or URL | List of keywords or URLs delimited by commas. |
creator | Organization or Person | The creator(s)/author(s) of the published dataset. |
funding | Grant | A Grant that directly or indirectly provide funding or sponsorship for this item. |
Recommended
Property | Type | Description |
---|---|---|
creditText | Text | Text that can be used to credit person(s) and/or organization(s) associated with a published Creative Work. |
version | Number or Text | The version of the dataset. |
size | Text | The size of the dataset (e.g., 2.5GB). |
maintainer | Person | Person that manages the datasets and should be contacted regarding any dataset related issues. |
measurementMethod | DefinedTerm, URL, Text | The type of method used to collect the data (e.g., “questionnaire”, “computerize test”, “video game”.) |
measurementTechnique | DefinedTerm, URL, Text | The instrument(s) used to collect the data (e.g., “Beck Depression Inventory”) |
variableMeasured | Property, PropertyValue, Text | Describes the type of construct being measured (e.g., “depression”), not the columns/variables in the dataset (e.g., “dep_q1”). |
Derived datasets
Sometimes a dataset is curated/aggregated and shared by people who were not the original collectors of the data. In those cases it might be unclear how exactly to attribute proper credit to each type of contribution.
In BDM, the creator(s) of a dataset are the people or organizations that published that specific version of the dataset, not the people or organizations that collected the data. For instance, if the shared dataset has been processed inadequately, it would not make sense to blame to original data collectors for such mistakes.
This being said, the original data collectors must be appropriately credited. There are 5 mechanisms we recommend
- name of the dataset refers to original data collectors;
- description of the dataset states clearly that data collection was performed by other people;
- citation provides a link to cite the original study
- creditText provides credit to both the curated and the original versions
- isBasedOn provides information and links to the original dataset.
Property | Type | Description |
---|---|---|
isBasedOn | URL or CreativeWork | A resource from which this work is derived or from which it is a modification or adaptation. |
citation | CreativeWork or Text | A citation or reference to another creative work, such as another publication, web page, scholarly article, etc. |