Folder structure
A dataset is a folder that contains data files and all relevant information needed to make sense of that data.
In its current form, a BDM-compatible dataset is a folder that contain data and all relevant information needed to make sense of that data. The name of the folder is the name of the dataset. Here is the structure of an example of dataset:
1
dataset/2
├── README.md3
├── LICENSE4
├── docs/5
├── instruments/
│ ├── bisbas.yaml
│ ├── stroop.yaml
│ └── ...
│6
├── agents.csv
│7
├── data/8
│ ├── agent_001/9
│ │ ├── studyflow.csv10
│ │ └── session_1/11
│ │ └── <activity>/
│ │ ├── trials_1.csv
│ │ ├── stimuli_1.csv
│ │ └── options_1.csv
│ └── agent_002/
│ └── ...
│12
└── examples/
├── example_1.py └── example_2.R
- 1
-
Dataset: The root
dataset/
folder contains all the relevant data, metadata, and documentations. The name of this folder is a name that is assigned to the dataset (e.g.,abcd-2021
). This name follows the convention of naming version-controlled repositories as datasets are often shared via repositories: datasets names may only include lowercase letters, numbers and hyphens. - 2
-
Dataset card: The
README.md
file contains the general description of the dataset and embeds its metadata in its front-matter section. The metadata represents a dataset card rather than a codebook to describe specific instruments. - 3
-
License: The
LICENSE
file specifies the terms and conditions for use, reproduction and distribution of the dataset. While different licenses may apply to different parts of a dataset, it is recommended to keep it simple and to use the same license for the entire dataset. You can use websites like choosealicense.com to help you choose a license for your dataset. - 4
-
Documentations: The
docs/
folder contains information about the study in general. This includes in particular administrative data (e.g., how participants were recruited), information about the experimental design, instructions and other messages communicated with study participants (e.g., information about consent; the study welcome page). - 5
-
Instruments: The
instruments/
folder contains information about the instruments used in the study. This includes metadata about each instrument (e.g., references to the relevant literature), links to the instrument content (e.g., an online version or a pdf of a questionnaire) but also codebooks that provide the definition of the parameters of the instruments that are recorded in the data. - 6
-
Agents metadata: The
agents.csv
lists all participants in a study, demographic information as well as what data is available for each of them. This data is most useful for quickly locating specific data (e.g., where is the data for subjects that are male, aged 35 and who have completed at least 10 trials of a specific task?). - 7
-
Data: Contains all the data collected within a study and that are within the scope of a dataset. Within
data/
there is one folder per subject that contains all the data about that subject. - 8
-
Agent data: The subject data folder is labelled following the template
agent_{x}
where x refers to a zero-padded integer (e.g.,agent__001
,agent_123
; the number of zeros used for padding should be commensurate with the number of subjects in the dataset. - 9
-
Studyflow: Each subject data folder contains at its root a
studyflow.csv
file which explains what activities were completed by this subject (with a competition status), when they started and ended each activity and in what order the activities were completed. - 10
-
Sessions (optional): In some cases it might be useful to group the data within subject into subfolder corresponding to multiple data collection sessions (e.g.,
session_01/
,session_02/
). This level of folders is optional and should be used when the data collection is divided into sessions. - 11
-
Activity data: Additional folders are included within each subject’s folder depending on the specific activities completed by that subject. For example, if the subject completed the BIS/BAS questionnaire and the Stroop test there would be two corresponding data folders (i.e.,
bisbas
,stroop
). - 12
-
Examples (optional): The
examples/
contains sample codes to showcase some aspects of the data (e.g., descriptive statistics, missing values plot).