Getting started
Understand what BDM is and how to use it
What is BDM?
Behaverse Data Model (BDM) is an interoperable dataset architecture for a wide range of cognitive tests and questionnaires. It provides a set of conventions to organize behavioral data in a way that is easy to understand by both domain experts and technical experts. It is opinionated but also flexible, so you can adapt it to your specific needs.
In the following sections, you will learn about the structure of a BDM dataset and how to create one. You will also learn about the different types of data in BDM and how to describe them with metadata.
How does a BDM dataset look like?
A BDM dataset is a collection of files and directories that follow a specific structure. Here is an example of a simple BDM dataset:
dataset/1
├── README.md2
├── agents.csv3
├── instruments/4
└── data/
├── agent_1/
├── agent_2/ └── agent_3/
- 1
-
The
README.md
file contains both human-readable description of the dataset (as markdown) and machine-readable metadata (as YAML front matter). - 2
-
The
agents.csv
file contains information about the human subjects and artificial agents in the dataset. It is a CSV file with one row per agent and columns for attributes. - 3
-
The
instruments/
folder contains parameters for the tasks, instructions, and questionnaires. Each instrument is represented as a YAML file that describes the instrument and its parameters. The instrument files are named after the instrument name, e.g.,instruments/DigitSpan.yaml
, and contain the various parameters for the instrument. - 4
-
The
data/
folder contains the data files. Within this folder, there is one folder per each agent (human or machine participants of the study). This directory-based partitioning makes it easier to access and manage the data.
Data files
The data/
folder contains the data files. Within this folder, data files are organized in a hierarchical structure: <AGENT>/<SESSION>/<ACTIVITY>/<DATA_FILE>
. This is commonly called directory-based partitioning and allows for easy access to the data files.
- The
data/<AGENT>/
folder contains the data for a single human subject or computer agent. It is organize by session folders for each session of the study. It may also contain other agent-specific data types, such asstudyflow.csv
that describes the order the activities for the agent. - the
data/<AGENT>/<SESSION>/
folder contains the data for a single session. Each session folder can contain multiple activities. - The
data/<AGENT>/<SESSION>/<ACTIVITY>/
are activity folders. Activities are the cognitive tests, questionnaires, or other data collection instruments used in the study. They contain data files for one or more attempts of the same activity.
Here is an example data/
folder of a dataset of three subjects, each with two sessions, and each session with two activities (UFOV
and DigitSpan
):
...
data/
├── agent_1/
│ ├── studyflow.csv
│ ├── session_1/
│ │ ├── UFOV/
│ │ │ └── ...
│ │ └── DigitSpan/
│ │ └── ...
│ └── session_2/
│ ├── UFOV/
│ │ └── ...
│ └── DigitSpan/
│ └── ...
├── agent_2/
│ ├── studyflow.csv
│ └── session_1/
│ ├── UFOV/
│ │ └── ...
│ └── DigitSpan/
│ └── ...
└── agent_3/
├── studyflow.csv
├── session_1/
│ ├── UFOV/
│ │ └── ...
│ └── DigitSpan/
│ └── ...
└── session_2/
├── UFOV/
│ └── ...
└── DigitSpan/ └── ...
Note that:
- Each agent has a folder named
agent_<ID>
where<ID>
is a unique identifier for the human subject or computer agent.<ID>
can be a number or a string, but must be unique within the dataset. For convenience, if your dataset contains only human subjects, you may also use thedata/subject_<ID>/
format, but the root data folder remainsdata/
. But it is recommended to use theagent_<ID>
naming for all datasets, as it is more flexible and extensible. - The
studyflow.csv
file contains the order and metadata of the all activities for the agent. - Each session has a folder named
session_<ID>
where<ID>
is a unique identifier for the session. Session identifiers are unique within the dataset, sosession_01
foragent_01
has the same session structure assession_01
foragent_02
. - Each activity has a folder named
<AGENT>/<SESSION>/<ACTIVITY>/
where<ACTIVITY>
is the name of the instrument used in the activity. In the example, the activities areUFOV
andDigitSpan
. For each activity, there is an instrument file in theinstruments/
folder with the same name. - The general idea of using file path to partition data is similar to the dataset partitioning in Apache Arrow or DuckDB.
Levels of data
BDM organizes data in three levels: events, trials, and models. Each level of data is represented by a different type of file.
- Events
-
Events are the lowest level of data in BDM. They represent the raw data collected during an activity. For example, in a
DigitSpan
activity, an event might be a single digit that the subject has to remember and the timestamp when the digit was presented. Events are stored in theevents.csv
file within the activity folder. - Trials
-
Trials are a higher level of data that represent a single attempt at an activity accompanied by the subject’s response and experimenter’s interpretation of the response. For example, in a
DigitSpan
activity, a trial might be a sequence of digits that the subject has to remember. Trials are stored in thetrial_<ATTEMPT>.csv
file within the activity folder. - Statistics & Models
-
Models are the highest level of data that represent the data analysis and interpretation of the trials. For example, in a
DigitSpan
activity, a model might be the subject’s working memory capacity or a more complex one, like a deep learning model that predicts the subject’s performance.
The main data in behavioral data analysis is the trials. Events are the raw data collected during an activity, and statistics and models are interpretations of the trials. The following sections assumes trials as the main data.
Activities data
See the Trial table for more information. Trials are stored in the trial_<ATTEMPT>.csv
file within the activity folder and can be accompanied by the optional supporting tables for stimuli, options, etc.
Within the activity folders, you will find the data files collected during the activity. Here is an example of a DigitSpan
activity folder:
...
DigitSpan/
├── trial_1.csv
├── trial_2.csv
├── stimulus_1.csv
├── stimulus_2.csv
├── option_1.csv └── option_2.csv
- The
trial_<ATTEMPT>.csv
contains the data collected during theDigitSpan
activity, where<ATTEMPT>
is 1-indexed suffix and partitions data by how many times the agent has initiated the activity, e.g., first attempt, second attempt, etc.
If there was only one attempt to complete the activity, then it can be namedtrial_1.csv
. If there were multiple attempts, then the file can be namedtrial_1.csv
,trial_2.csv
, etc. - In the example above, there are two attempts for the
DigitSpan
activity, so there are two trial tables:trial_1.csv
andtrial_2.csv
.
Instruments
The instruments/
folder contains the instrument files. Each instrument file is a YAML file that describes the instrument and its parameters. Here is an example of a UFOV
instrument file:
name: UFOV
description: |
The Useful Field of View (UFOV) test is a measure of visual attention. The test consists of three subtests: processing speed, divided attention, and selective attention.
parameters:
- name: subtest
description: The subtest of the UFOV test
type: enum
values:
- processing_speed
- divided_attention - selective_attention