Getting started
Understand what BDM is and how to use it
What is BDM?
Behaverse Data Model (BDM for short) is a dataset architecture that aims to provide interoperable data for modern cognitive projects. It is a set of conventions that help you organize your data in a way that is easy to understand for both domain experts and technical experts. It is designed to work with a wide range of cognitive tasks and questionnaires.
Whether you’re an individual scientist or part of a larger lab, BDM can help you build a dataset that is easy to understand, maintain, and share with others. BDM is opinionated but designed to be flexible and extensible, so you can adapt it to your specific needs.
What is a dataset and how to organize it using BDM
How does a BDM dataset look like?
A BDM dataset is a collection of files and directories that follow a specific structure. Here is an example of a simple BDM dataset:
dataset/
├── README.md
├── agents.csv
├── instruments/ └── data/
The README.md
file contains both human-readable description (as markdown) and machine-readable metadata of the dataset (as YAML front matter).
The agents.csv
file contains information about the human subjects and artificial agents in the dataset. It is a CSV file with one row per agent and columns for attributes.
The instruments/
folder contains parameters for the tasks, instructions, and questionnaires. Each instrument represented as a YAML file that describes the instrument and its parameters. The instrument files are named after the instrument name, e.g., instruments/UFOV.yaml
.
Data files
The data/
folder contains the data files. Within this folder, data files are organized as <AGENT>/<SESSION>/<ACTIVITY>/<DATA_FILE>
.
This folder also contains additional data for the studyflow, stimuli, and other data types. For example, the data/
folder for a dataset of three subjects, each with two sessions, and each session with two activities might look like this:
...
data/
├── agent_1/
│ ├── studyflow.csv
│ ├── session_1/
│ │ ├── UFOV/
│ │ │ └── ...
│ │ └── DigitSpan/
│ │ └── ...
│ └── session_2/
│ ├── UFOV/
│ │ └── ...
│ └── DigitSpan/
│ └── ...
├── agent_2/
│ ├── studyflow.csv
│ └── session_1/
│ ├── UFOV/
│ │ └── ...
│ └── DigitSpan/
│ └── ...
└── agent_3/
├── studyflow.csv
├── session_1/
│ ├── UFOV/
│ │ └── ...
│ └── DigitSpan/
│ └── ...
└── session_2/
├── UFOV/
│ └── ...
└── DigitSpan/ └── ...
Note that:
- Each agent has a folder named
agent_<ID>
where<ID>
is a unique identifier for the human subject or computer agent.<ID>
can be a number or a string, but must be unique within the dataset. For convenience, if your dataset contains only human subjects, you may also use thedata/subject_<ID>/
format to name the folders, but the root data folder remainsdata/
. But it is recommended to use theagent_<ID>
format for all datasets, as it is more flexible and extensible. - The
studyflow.csv
file contains the order and metadata of the all activities for the agent. - Each session has a folder named
session_<ID>
where<ID>
is a unique identifier for the session. Session identifiers are unique within the dataset, sosession_01
foragent_01
is the same assession_01
foragent_02
. - Each activity has a folder named
<AGENT>/<SESSION>/<ACTIVITY>
where<ACTIVITY>
is the name of the activity. In this example, the activities areUFOV
andDigitSpan
. For each activity, there is an instrument file in theinstruments/
folder with the same name.
Levels of data
The BDM dataset is organized into three levels of data:
- Events
-
Events are the lowest level of data in BDM. They represent the raw data collected during an activity. For example, in a
DigitSpan
activity, an event might be a single digit that the subject has to remember and the timestamp when the digit was presented. Events are stored in theevents.csv
file within the activity folder. - Trials
-
Trials are a higher level of data that represent a single attempt at an activity accompanied by the subject’s response and experimenter’s interpretation of the response. For example, in a
DigitSpan
activity, a trial might be a sequence of digits that the subject has to remember. Trials are stored in thetrials_*.csv
file within the activity folder. - Models
-
Models are the highest level of data that represent the data analysis and interpretation of the trials. For example, in a
DigitSpan
activity, a model might be the subject’s working memory capacity or a more comples one, like a deep learning model that predicts the subject’s performance.
The main data in behavioral data analysis is the trials. Events are the raw data collected during an activity, and models are the data analysis and interpretation of the trials. The following sections assumes trials as the main data.
Activities data
See the Trial table for more information. Trials are stored in the trials_*.csv
file within the activity folder and can be accompanied by the optional supporting tables for stimuli, options, etc.
Within the activity folders, you will find the data files collected during the activity. The structure of the activity folder is defined in the BDM Trials schema. Here is an example of a DigitSpan
activity folder:
...
DigitSpan/
├── trials_1.csv
├── stimuli_1.csv └── options_1.csv
- The
trials_<ATTEMPT>.csv
contains the data collected during the activity, where<ATTEMPT>
is 1-indexed suffix and partitions data by how many times the agent has initiated the activity, e.g., first attempt, second attempt, etc. The idea is similar to partitioning in Apache Arrow datasets.
If there was only one attempt to complete the activity, then it can be namedtrials_1.csv
. If there were multiple attempts, then the file can be namedtrials_1.csv
,trials_2.csv
, etc.
In the example above, data foragent_01
includes two attempts for theDigitSpan
activity in session 1, so there are two files:trials_1.csv
andtrials_2.csv
. - The dataset is in incomplete in terms of having the same data for all the agents. Data for the second subject (
agent_02
) does not include a second session, so there is nosession_02
folder for this subject.
Instruments
The instruments/
folder contains the instrument files. Each instrument file is a YAML file that describes the instrument and its parameters. Here is an example of a UFOV
instrument file:
name: UFOV
description: |
The Useful Field of View (UFOV) test is a measure of visual attention. The test consists of three subtests: processing speed, divided attention, and selective attention.
parameters:
- name: subtest
description: The subtest of the UFOV test
type: enum
values:
- processing_speed
- divided_attention - selective_attention