Dataset card
There are two main sources of metadata in BDM: the dataset card and the codebook.
- Dataset card
-
A structured representation of metadata about the dataset. It is generally defined in the
README.md
front matter of a dataset (see folder structure) and provides information about its creator, license, funding, and other relevant information. - Codebook
-
Part of the dataset card that contains information about the variables in the data (such as their names, types, and descriptions). It also includes information about the values that each variable can take on and any missing data that may exist in the dataset. The codebook is an essential component of the dataset, as it provides context and semantic that helps researchers interpret the data.
Required fields
Property | Type | Description |
---|---|---|
name | Text | The name of the dataset. |
description | Text | Description of the dataset. |
license | URL | The license that applies to the dataset. |
keywords | Text or URL | List of keywords or URLs delimited by commas. |
creator | Organization or Person | The creator(s)/author(s) of the published dataset. |
funding | Grant | A Grant that directly or indirectly provide funding or sponsorship for this item. |
Recommended
Property | Type | Description |
---|---|---|
creditText | Text | Text that can be used to credit person(s) and/or organization(s) associated with a published Creative Work. |
version | Number or Text | The version of the dataset. |
size | Text | The size of the dataset (e.g., 2.5GB). |
maintainer | Person | Person that manages the datasets and should be contacted regarding any dataset related issues. |
measurementMethod | DefinedTerm, URL, Text | The type of method used to collect the data (e.g., “questionnaire”, “computerize test”, “video game”.) |
measurementTechnique | DefinedTerm, URL, Text | The instrument(s) used to collect the data (e.g., “Beck Depression Inventory”) |
variableMeasured | Property, PropertyValue, Text | Describes the type of construct being measured (e.g., “depression”), not the columns/variables in the dataset (e.g., “dep_q1”). |
Derived datasets
Sometimes a dataset is curated/aggregated and shared by people who were not the original collectors of the data. In those cases it might be unclear how exactly to attribute proper credit to each type of contribution.
In BDM, the creator(s) of a dataset are the people or organizations that published that specific version of the dataset, not the people or organizations that collected the data. For instance, if the shared dataset has been processed inadequately, it would not make sense to blame to original data collectors for such mistakes.
This being said, the original data collectors must be appropriately credited. There are 5 mechanisms we recommend
- name of the dataset refers to original data collectors;
- description of the dataset states clearly that data collection was performed by other people;
- citation provides a link to cite the original study
- creditText provides credit to both the curated and the original versions
- isBasedOn provides information and links to the original dataset.
Property | Type | Description |
---|---|---|
isBasedOn | URL or CreativeWork | A resource from which this work is derived or from which it is a modification or adaptation. |
citation | CreativeWork or Text | A citation or reference to another creative work, such as another publication, web page, scholarly article, etc. |