Getting started

Understand what BDM is and how to use it

What is BDM?

Behaverse Data Model (BDM for short) is a dataset architecture that aims to provide interoperable data for modern cognitive projects. It is a set of conventions that help you organize your data in a way that is easy to understand for both domain experts and technical experts. It is designed to work with a wide range of cognitive tasks and questionnaires.

Whether you’re an individual scientist or part of a larger lab, BDM can help you build a dataset that is easy to understand, maintain, and share with others. BDM is opinionated but designed to be flexible and extensible, so you can adapt it to your specific needs.

What is a dataset and how to organize it using BDM

How does a BDM dataset look like?

A BDM dataset is a collection of files and directories that follow a specific structure. Here is an example of a simple BDM dataset:

dataset/
├── README.md
├── agents.csv
├── instruments/
└── data/

The README.md file contains both human-readable description (as markdown) and machine-readable metadata of the dataset (as YAML front matter).

The agents.csv file contains information about the human subjects and artificial agents in the dataset. It is a CSV file with one row per agent and columns for attributes.

The instruments/ folder contains parameters for the tasks, instructions, and questionnaires. Each instrument represented as a YAML file that describes the instrument and its parameters. The instrument files are named after the instrument name, e.g., instruments/UFOV.yaml.

Data files

The data/ folder contains the data files. Within this folder, data files are organized as <AGENT>/<SESSION>/<ACTIVITY>/<DATA_FILE>.

This folder also contains additional data for the studyflow, stimuli, and other data types. For example, the data/ folder for a dataset of three subjects, each with two sessions, and each session with two activities might look like this:

...
data/
├── agent_1/
│   ├── studyflow.csv
│   ├── session_1/
│   │   ├── UFOV/
│   │   │   └── ...
│   │   └── DigitSpan/
│   │       └── ...
│   └── session_2/
│       ├── UFOV/
│       │   └── ...
│       └── DigitSpan/
│           └── ...
├── agent_2/
│   ├── studyflow.csv
│   └── session_1/
│       ├── UFOV/
│       │   └── ...
│       └── DigitSpan/
│           └── ...
└── agent_3/
    ├── studyflow.csv
    ├── session_1/
    │   ├── UFOV/
    │   │   └── ...
    │   └── DigitSpan/
    │       └── ...
    └── session_2/
        ├── UFOV/
        │   └── ...
        └── DigitSpan/
            └── ...

Note that:

  • Each agent has a folder named agent_<ID> where <ID> is a unique identifier for the human subject or computer agent. <ID> can be a number or a string, but must be unique within the dataset. For convenience, if your dataset contains only human subjects, you may also use the data/subject_<ID>/ format to name the folders, but the root data folder remains data/. But it is recommended to use the agent_<ID> format for all datasets, as it is more flexible and extensible.
  • The studyflow.csv file contains the order and metadata of the all activities for the agent.
  • Each session has a folder named session_<ID> where <ID> is a unique identifier for the session. Session identifiers are unique within the dataset, so session_01 for agent_01 is the same as session_01 for agent_02.
  • Each activity has a folder named <AGENT>/<SESSION>/<ACTIVITY> where <ACTIVITY> is the name of the activity. In this example, the activities are UFOV and DigitSpan. For each activity, there is an instrument file in the instruments/ folder with the same name.

Levels of data

The BDM dataset is organized into three levels of data:

Events
Events are the lowest level of data in BDM. They represent the raw data collected during an activity. For example, in a DigitSpan activity, an event might be a single digit that the subject has to remember and the timestamp when the digit was presented. Events are stored in the events.csv file within the activity folder.
Trials
Trials are a higher level of data that represent a single attempt at an activity accompanied by the subject’s response and experimenter’s interpretation of the response. For example, in a DigitSpan activity, a trial might be a sequence of digits that the subject has to remember. Trials are stored in the trials_*.csv file within the activity folder.
Models
Models are the highest level of data that represent the data analysis and interpretation of the trials. For example, in a DigitSpan activity, a model might be the subject’s working memory capacity or a more comples one, like a deep learning model that predicts the subject’s performance.

The main data in behavioral data analysis is the trials. Events are the raw data collected during an activity, and models are the data analysis and interpretation of the trials. The following sections assumes trials as the main data.

Activities data

See the Trial table for more information. Trials are stored in the trials_*.csv file within the activity folder and can be accompanied by the optional supporting tables for stimuli, options, etc.

Within the activity folders, you will find the data files collected during the activity. The structure of the activity folder is defined in the BDM Trials schema. Here is an example of a DigitSpan activity folder:

...
DigitSpan/
├── trials_1.csv
├── stimuli_1.csv
└── options_1.csv
  • The trials_<ATTEMPT>.csv contains the data collected during the activity, where <ATTEMPT> is 1-indexed suffix and partitions data by how many times the agent has initiated the activity, e.g., first attempt, second attempt, etc. The idea is similar to partitioning in Apache Arrow datasets.
    If there was only one attempt to complete the activity, then it can be named trials_1.csv. If there were multiple attempts, then the file can be named trials_1.csv, trials_2.csv, etc.
    In the example above, data for agent_01 includes two attempts for the DigitSpan activity in session 1, so there are two files: trials_1.csv and trials_2.csv.
  • The dataset is in incomplete in terms of having the same data for all the agents. Data for the second subject (agent_02) does not include a second session, so there is no session_02 folder for this subject.

Instruments

The instruments/ folder contains the instrument files. Each instrument file is a YAML file that describes the instrument and its parameters. Here is an example of a UFOV instrument file:

name: UFOV
description: |
  The Useful Field of View (UFOV) test is a measure of visual attention. The test consists of three subtests: processing speed, divided attention, and selective attention.

parameters:
  - name: subtest
    description: The subtest of the UFOV test
    type: enum
    values:
      - processing_speed
      - divided_attention
      - selective_attention
Back to top