Glossary
Controlled vocabulary for behavioral data terms and BDM concepts
This glossary is generated from the published behaverse/schemas artifacts: cross-cutting terms from the vocabulary, plus the field terms each schema defines.
-
accuracy
float -
Refers to a measure of performance. In many behavioral tasks, it reflects the percentage (0-100%) or fraction (0-1) of correct responses. Always use accuracy to refer to a performance measure that is a real number (float) and bounded to the [0-1] range.
Range
0 to 1 (inclusive)
-
correct
boolean -
A boolean which indicates whether a response in a given trial was correct or not. When no response was given when it should (i.e., timeout), correct evaluates to
FALSErather thanN/A. This is to avoid the case where subjects would be given a high performance score when in fact they avoided all difficult trials and responded correctly only to easy trials. -
response_time
float -
The meaning of response time or reaction time (and its unit) is not consistent across studies. In BDM,
response_timeis the duration in seconds between a) the moment the subjects fully completed their response on a given trial, and b) the moment that the earliest possible correct response could have been completed by a hypothetical agent with perfect knowledge of the task and ability to instantaneously execute the response.Range
In seconds
-
timed_out
boolean -
Indicates whether the subject failed to respond within the allocated time period.
-
age
float -
Age is typically expressed in years. However, we don’t recommend rounding “age” to get integer values, as rounding implies losing data. It is better to leave variables as real numbers (floats when they are floats) and let the data analysts decide whether or not rounding this variable is necessary for their specific use case.
-
gender/sex
enum -
Gender and sex are not exactly the same. Sex refers to a biological sex while gender is a more complex construct. A person may have a male biological sex but identify as a women for example. Depending on the question asked, the variable should therefore be either
sexorgender.For example, “What sex were you assigned at birth, such as on an original birth certificate?” is a question about biological sex and should be coded as
sex. The possible values forsexare:Range
female: female (girl, woman)male: male (boy, man)other: other non-binaryskip: prefer not to say
-
length
float -
Refers to the length in centimeters of a physical object. When possible use a more specific word (e.g., height, width, distance).
-
height
float -
Refers to the height of a physical object in centimeters.
-
width
float -
Refers to the width of a physical object in centimeters.
-
weight
float -
Refers to the weight of a physical object in kilogram.
-
*_count
integer -
Refers to the cardinality of that entity. A variable named
page_countindicates the number of pages. Or, if an observation/row hascar_count = 5this means that this particular observation involves a total count of 5 cars; this 5 is unrelated to other rows in the table. -
type
enum -
Type is always an enum with known values. The meaning of the particular enum value needs to be explained in a codebook.
-
description
string -
Description is always a text (string) for human consumption. While it is not strictly necessary, a textual description can greatly facilitate the understanding and processing of the data by humans.
-
mean
float -
The average of a numeric variable.
-
median
float -
The median of the variable.
- mode
-
The mode of a variable.
- min
-
The minimum value of a variable.
- max
-
The maximum value of a variable.
- sd
-
The standard deviation of a variable.
- var
-
The variance of a variable.
- iqr
-
The interquartile range of a variable.
-
sum
float -
The sum of all values of a variable (e.g.,
item_price_sum = sum(item_price)). -
quantile*
float -
Quantile is similar to percentile, as both refer to the value of a parameter Q that splits the data such that a given fraction of the data is smaller than Q. Quantile expresses that fraction as a number between 0 and 1 while percentiles express it as a percentage (between 0 and 100).
-
rank
integer -
Rank of a value in a set (ascending or first to last).
-
log
float -
Natural log.
-
log2
float -
Log of base 2.
-
log10
float -
Log of base 10.
- sqrt
-
Square root.
- pow2
-
Power of 2.
-
floor
integer -
Flooring of a number (e.g., 3.6 becomes 3).
-
ceil
integer -
Ceiling of a number (e.g., 3.6 becomes 4).
-
round
integer -
Rounding of a number to the closest integer (e.g., 3.6 becomes 4).
-
*_id
stringinteger -
If a column or variable name is suffixed with
_id(e.g.,participant_id,task_id), it is expected that there exists a supplementary table which has the same name (“participant”, “task”), with a primary key namedidsuch that a value of in the first (particiapant_id = 215) refers to an entry in the second (a row in the participant table whereid = 215). It is expected that the values in a variable postfixed_idare unique within a “local scope” of the source table; however, it is not expected that they are unique globally—for such purposes one should use the_uuid.Range
Unique within a table or within an explicit context
-
*_name
string -
Sometimes “name” is used in a way that is similar to a unique id (e.g.,
study_nameortask_name). The difference between “id” and “name” is that “name” is expected to be a readable text (e.g.,n-backversusf346-r23v). As with “id”, it is expected that it refers to other tables and that it is unique within a certain context (contrary to, for example, “label”). -
*_uuid
string -
Universally Unique Identifier (UUID) is a random 32-digit label that can be generated on the fly and will most likely be unique in computer systems. UUID can be used to assign a record a unique identifier without having to ensure that that number is not yet used by some other records or tables.
Range
UUIDv7 or later
-
*_hash
string -
It is sometimes useful to create a reproducible keys based on some data. A hash is not strictly necessary as it can be recreated using different data but it can be convenient for data processing.
-
*_index
integer -
Indices should be favored over labels and ids when a variable is used for referencing and when order is important (often, but not always, the chronological order). For example, a variable named
stimulus_position_indeximplies its value points to an entry in a list of possible stimulus positions.Range
1-based indices
-
*_repetition
integer -
Repetition counts the number of times the same “thing” occurred, e.g., a participant completes the same test twice, the same stimulus appears multiple times.
Range
0-based
-
*_label
string -
A text attached to a variable and identifies it. It is expected to be human readable, but not always unique.
-
activity_index
index -
When subjects complete multiple activities (e.g., a cognitive test followed by a questionnaire), this variable indicates the order of each activity (i.e., the first activity completed by the subject has
activity_index= 1, the second session hasactivity_index= 2; even if the second activity is an exact repetition of the first one.Range
1-based index of activity within the subject-level data.
-
adaptive_method_config
string -
More detailed configuration for the adaptive method, including initial values, step sizes, and termination criteria.
-
adaptive_method_name
enum -
Specifies the adaptive procedure used to modify instrument parameters in response to subject performance (e.g.,
staircase). -
adaptive_parameter_name
string -
Specific instrument parameter that is dynamically modified based on the subject’s performance.
-
adaptive_parameter_value
any -
The specific value of the instrument parameter that was used for this trial. This value is updated as the adaptive algorithm adjusts the parameter based on the subject’s responses.
Range
Data type depends on the type of the adaptive parameter, as defined in
adaptive_parameter_name) -
adaptive_parameter_value_next
any -
The next value of the adaptive parameter that will be used in the subsequent trial, as determined by the adaptive algorithm.
Range
Data type depends on the type of the adaptive parameter, as defined in
adaptive_parameter_name) -
additional_measures
string -
Indicates whether additional measures have been recorded during this trial and if so what kind of measures they are. Possible values include (non-exhaustive):
Range
mouse_trajectories-fmri-eye_tracking-heart_rate
-
agent_id
string -
A unique identifier assigned to the agent (typically person) generating the responses. This ID tracks their participation and responses throughout the study. See
Agenttable. -
animation
string -
Describes the animation used to display a specific stimulus in a human-readable format. For example, “fadeIn 3s” indicates a 3-second fade-in animation.
-
block_id
string -
Specific parameterization of the instrument for a single block of trials (e.g., “DS_FORWARD_PRACTICE” and “DS_FOWARD_TEST”). Block-level parameters override timeline-level parameters.
-
block_index
index -
Refers to the order in which this block has been experienced by the subject. When there are multiple blocks, this variable indicates the order of each block (i.e., the first block completed by the subject has
block_index=1, the second block hasblock_index=2, even if the second block is an exact repetition of the first one).Range
1-based index of the block within the timeline
-
block_name
string -
The name of a particular block in a timeline. If the same block is completed twice in a row, they would have different
block_indexvalues (1 and 2, respectively) but they would have the sameblock_name(e.g., “NB_timeline1_block1”). More details about theblock_nameis available in theInstrumenttable. -
block_type
enum -
Specifies the experimental role of the block (e.g., tutorial, practice, test, instruction).
Range
tutorial: A simplified version of the test designed to teach participants how the test works. -practice: Typically identical to the main test blocks but are used to get subjects accustomed to the task in a no-stakes environment. -test: Primary blocks used to measure the desired behaviors. -instruction: Presents written and/or visual instructions to the subject.
-
color_hex
string -
The hexadecimal RGB color code of the component (e.g., #FF0000 for red) with optional alpha channel for transparency.
Range
#000000 to #FFFFFF
-
color_name
string -
The human-readable name of the component color, e.g.,
red. -
duration
float -
Describes for how long this stimulus was displayed after its onset, in seconds.
Range
In seconds
-
episode_index
index -
Episodes are temporally distinct bins of time (no overlap and discrete). The binning of the time into successive episodes depends on the task; it is mostly used and necessary to group data in cases where two distinct trials occurred at the same time (e.g., dual N-back).
Range
1-based interger index.
-
evaluation_label
enum -
There are several labels that can be assigned to a given response to specify what that response means in terms of evaluation within a task. The most general terms are “correct” and”error” (which are already given by the correct variable). There are however more specific sets of terms that may apply in different contexts. For example, in a signal detection task, it is common to use labels from the signal detection theory framework (i.e., “hit”, “miss”, “false alarm”, “correct rejection”). In other contexts, researchers might use terms like “omission” or “commission” errors or even things like “perseveration” error (e.g., in the Wisconsin Card Sorting Test). Note that these terms are not always well defined or exclusive. For example, a “hit” is also a “correct” response and a “false alarm” may be synonymous to “commission error”. Whenever possible use the more specific terms (i.e., always use “hit” rather than “correct” when applicable). Here are few evaluation labels that are commonly used:
Range
correct: The response is correct. -error: The response is incorrect. -hit: The stimulus was present and the subject correctly responded present. -miss: The stimulus was present and the subject correctly responded absent. -fa: The stimulus was absent and the subject correctly responded present. -cr: The stimulus was absent and the subject correctly responded absent.
-
expected_response_description
string -
A description of the expected response using the same convention as response_description.
-
expected_response_option_index
integer -
The index of the option the subject is expected to choose from the set of options.
-
feedback_description
string -
Lists the different kinds of feedback that were shown on a given trial. When multiple types of feedback were used, feedback will list them using
;as a separator. If a given type of feedback was shown multiple times during a trial, that feedback type is listed only once (i.e., feedback_description does NOT represent the sequence of feedbacks). The possible values for feedback are:Range
none: No feedback was shown. -expected_response: Feedback indicated what the correct response would have been. -explanation: Feedback explains why a certain option is the correct one. -correctness_on_option: “Feedback indicates (on the option itself) if the option chosen by the participant was the correct one (e.g., in green), or not (e.g., in red). -correctness_on_screen:”Feedback displayed on the screen center indicates if the response to the current trial was correct or not (e.g., using a green check or a red cross).
-
group_name
string -
Subjects may be assigned to different groups. Typically, different groups will have different experiences within a study.
-
index
index -
A 1-based index indicating the stacking order of stimulus components. A stimulus component with a higher
indexis displayed on top of those with lower values, similar to CSS z-index property.Range
1-based indices
-
index_in_source
integer -
When a stimulus is picked from a particular set (e.g., “digits1to9”), this index refers to the index within that set.
-
index_in_trial
integer -
Refers to individual stimuli within the sequence or set of stimuli shown during a trial.
-
input_action_type
enum -
Refers to the type of action the subject performs to give a response. Possible values include (non-exhaustive):
Range
- “mouse-click” - “mouse-release” - “key-press” - “key-release” - “mouse-drag” - “touch” - “swipe”
-
input_count
integer -
The number of inputs (i.e., actions) the user made during the trial.
-
input_id
PRIMARY KEY -
Primary key; each input or click has its own identifier value that is unique within the table.
-
input_interface_type
enum -
Refers to the type of interface subjects used to input actions. Possible values include (non-exhaustive):
Range
keyboard: A keyboard is displayed on the screen. -buttons: Dedicated buttons on the screen. -stimulus-button: Stimuli serves as buttons. -text-field: A text field is displayed on the screen. -slider: A slider is displayed on the screen.
-
instrument_id
string -
The unique identifier of the instrument used for collecting data (e.g., the name of the computer script used to run the test). Unique in the
Instrumenttable and corresponding files in theinstruments/folder.Range
idin theInstrumenttable. -
instrument_repetition
integer -
The number of times this particular instrument has already been completed anytime in the past by this particular subject in this study. This variable has a value 0 the first time an instrument is used.
Range
0-based
-
is_object_enabled
boolean -
Indicates whether the object that was clicked on was enabled (clickable) or not.
-
job_description
string -
The more specific description of a job, which gives more information about what the participant sees and has to do. Whereas the
job_typetypically uses only verbs and adjectives, thejob_descriptionalso contains nouns (e.g., “recall-digits-forward”, “recall-letters-backward”). -
job_repeat
enum -
Whether this trial’s job has not been seen before in this timeline (i.e., specific version of the instrument).
Range
new: The job has never been seen before by this subject in the current study. -repeat: The job is the same as the previous trial. -switch: The job is different from the previous trial but has been seen prior in the timeline.
-
job_type
string -
The general type of operation the subject needs to perform. The job typically is expressed as a verb (e.g., “recall”, “sort”) and can be the same for different instruments (e.g., Digit Span test and Spatial Span test both have a job of type “recall-forward”).
-
language
string -
The language the task was completed in, expressed as a two-letter code within the ISO_639-1 standard.
-
link
url -
External link, if any, the provides more information about the instrument, e.g., on Cognitive Atlas.
-
measurement_type
enum -
Describes the type of measurement implied by Option which in turn has implications on how that data should be processed during analysis; takes a value in:
Range
- “nominal”: Set of unordered labels (e.g., {“Luxembourg”, “France”, “Germany”}). - “ordinal”: “Ordered values without clear distance (e.g., {“a lot”, “a bit”, “not at all”}). -”interval”: Ordered values with clear distances but no absolute zero (e.g., 10 versus 20 degrees Celsius). - “ratio”: Values with clear distance metrics and absolute zero (e.g., length in cm).
-
multitask_type
enum -
Subjects may be required to perform multiple tasks at the same time. This variable indicates the type of multitasking required.
Range
- Empty: No multitasking, i.e., single-tasking. -
concurrent: There are two independent tasks that need to be completed in parallel. -compound: The task requires multiple successive stages or involves tasks that are dependent/coupled.
- Empty: No multitasking, i.e., single-tasking. -
-
name
string -
Name of the toolkit (scene, code, or configuration) that is used to collect the data, e.g., “DS” for a software that runs digit span task in forward OR backward order. The specific parameterization of the instrument is defined by the “Timeline” (e.g., a variant of instrument called “DS_FORWARD”).
-
object_id
string -
A stimulus is defined by a set of features. This variable is used to identify each time the same stimulus features were used.
-
object_name
string -
The human-readable name of the object that was clicked on (e.g., “sos_box_1_3”).
-
object_state
string -
Describes the state the object was in before it was clicked on. The meaning of “state” depends on the particular task (e.g., “new empty”).
-
object_type
enum -
Describes the type of object that was clicked on (e.g., “button”).
-
onset
float -
Duration between the start of the trial and the appearance of the stimulus, in seconds.
Range
In seconds
-
option_count
integer -
The number of options the participant can choose from on a given trial.
-
option_data_type
enum -
Describes the type of data this option entails. Possible values include:
Range
- “nominal”: Set of unordered labels (e.g., {“Luxembourg”, “France”, “Germany”}). - “ordinal”: “Ordered values without clear distance (e.g., {“a lot”, “a bit”, “not at all”}). -”interval”: Ordered values with clear distances but no absolute zero (e.g., 10 versus 20 degrees Celsius). - “ratio”: Values with clear distance metrics and absolute zero (e.g., length in cm).
-
option_id
integer -
Is a unique identifier for the option (set or generator) used on a given trial.
-
option_source
string -
Refers to the specific generator or set that determined the options on a given trial. Option that stem from the same source have the same data scheme and could thus be described in a table named after option_source (i.e., option_source indicates which table contains the full information about the option set).
-
option_source_type
enum -
A set of options is typically created using a particular procedure/algorithm (“generator”) or is sampled from a particular set (“set”). This variable indicates which of these two applies for the current options.
-
orientation
enum -
Indicates the symbol orientation.
Range
north: bottom to top. -north_east-east: left to right. -south_east-south: top to bottom. -south_west-west: right to left -north_west-free: no specific orientation.
-
outcome_description
string -
Describes the observable consequences of the subject’s response (e.g., “the opened box is empty”).
-
outcome_numeric
float -
A numeric value describing the observable consequences of the subject’s response (e.g., +3 points).
-
panel_id
string -
Identifier of the panel this stimulus is displayed over.
-
presentation_id
string -
In a multitasking setting, a particular instance of a stimulus (e.g., the current letter “A”) may be used by multiple tasks at the same time (e.g, in the dual N-back task). Because these are different trials, they will have different
trial_idvalues and hence will have different rows in theStimulustable. We usepresentation_idto indicate that a given stimulus is in fact the same instance across those trials. -
response_count
integer -
Each trial contains by definition only one response. However, when response_structure is other than unitary, a response comprises multiple pieces of information (e.g., “3-5-7” could be one response in the digit span task and this response contains three components, namely “3”, “5” and “7”). response_count refers to the number of components that make up a response (not the number of responses within a trial).
-
response_datetime
datetime -
The datetime corresponding to the completion of the response.
-
response_description
string -
A description of participant’s response; typically the description of the option that was chosen.
-
response_element_index
index -
Indicates which of the clicks is used and in what order to form the actual response in the response table when
response_structureis “sequence” or “set”.Range
1 to
input_countin the corresponding row of theResponsetable. -
response_id
PRIMARY KEY -
A unique identifier assigned to responses in temporal order, meaning that larger IDs correspond to more recent responses that occurred later in time. This ID is unique within this table; no two rows share the same value.
-
response_initiation_time
float -
In some cases (e.g., a reaching movement) it might be useful to encode when a response was initiated.
-
response_numeric
float -
A numeric value associated with a particular response; this could be a numeric value entered directly by the subject or the numeric meaning of a selected option (for example, the choice of option “Never” may be associated with the numeric value of 0). Note that this variable describes the subject’s response; it does not describe the value (e.g., correctness or goodness) that is associated with that response.
-
response_option_index
integer -
The index of the option the participant chose, starting from 1.
-
response_skipped
boolean -
In some cases (e.g., in some questionnaires), subjects have the option to skip a question.
-
response_structure
string -
The structure of the response required by the subject; can take values in:
Range
unitary: The subject provides a single input (e.g., chooses option same). -set: The subject provides a set of information, and the order does not matter (e.g., list words that start with the letter A). -sequence: The subject provides a sequence of information, and the order matters (e.g., a sequence of memorized digits in their order of appearance).
-
response_validation_time
float -
In some cases, subjects may need to press an extra key to validate previous responses. When relevant, this variable may encode this duration.
-
role
enum -
Describe the role that the stimulus plays in the trial, e.g., “target”.
Range
target: A stimulus the agent must process and which should trigger the completion of the response (e.g., classify, reach, memorize) if the agent is doing the task as intended. In some cases (e.g., in a go/no-go task) the correct response to a stimulus is to NOT click the button. In this case, the stimulus that triggered the decision to NOT click the button is still atarget. -non_target: A stimulus the agent must process but which does not trigger the completion of the response (e.g., the first two stimuli in a 2-back test). -distractor: “A stimulus the agent should not process at all (i.e., ignore) and which is unrelated to the correct execution of the task. -location_cue: A stimulus giving a spatial location information that agents could use to improve their performance. -job_specifier: A stimulus specifying which job the agent should perform. -stop_signal:”A stimulus signaling the agents that they should abort current action. -probe: A stimulus indicating about which stimulus to respond.
-
score
float -
A numeric value associated with a particular response in a given context. This variable may be used to compute a performance metric or a questionnaire level index (e.g., a well-being score).
-
session_id
integer -
When there are multiple sessions, this variable indicates the order of each session (i.e., the first session completed by the subject has
session_index= 1, the second session hassession_index= 2; even if the second session is an exact repetition of the first one.Range
index of session within subject.
-
source
string -
Refers to the specific generator or set the stimulus belongs to.
-
source_type
enum -
A stimulus is typically created using a particular procedure/algorithm (“generator”) or is sampled from a particular set (“set”). This variable indicates which of these two applies for the current stimulus.
Range
set: stimulus is sampled from a fixed set of stimuli. -generator: “stimulus is created using a procedure/algorithm.
-
stimulus_count
integer -
The number of stimuli shown to the participant during the trial.
Range
This should match number of stimuli in stimulus_id
-
stimulus_description
string -
A human readable, compact description of the main aspects of the stimulus. The description for a given stimulus depends on the task but follows a specific template for a given task. Because of this, it looks like the stimulus_description could be “parsed” and “tidied”—however, this is not the intention; parsed/tidied data will be available in other tables; description is for human readability and facilitates the understanding of the data.
-
stimulus_id
integer -
Is a unique identifier for the (unitary, set or sequence of) stimuli presented during a trial; if those exact same stimuli are repeated in a different trial, that trial would have the same value for stimulus_id. stimulus_id may also be used to refer to a specific message or question in a questionnaire.
-
stimulus_index
list[index] -
Indexes in chronological (or spatial) order the stimuli shown within an instrument (counting one stimulus per response).
stimulus_indexmay for instance be used to refer to the nth question asked within a questionnaire. -
stimulus_index_in_source
integer -
Index of the stimulus within the table referred to by stimulus_source. : For example, if stimulus_source == “digit1to9”, stimulus_index_in_source = 1 refers to “1” while for stimulus_source == “LettersAtoD”, stimulus_index_in_source = 1 refers to “A”.
-
stimulus_onset
float -
Duration between the start of the trial and the appearance of the stimulus, in seconds.
Range
In seconds
-
stimulus_panel_count
integer -
The number of panels or screen areas stimuli may appear on during the trial. For example, in a task where stimuli to be compared are presented on the left and right side of the screen, stimulus_panel_count = 2.
-
stimulus_position_index
integer -
Refers to discrete positions on the screen the stimulus may appear on. The set and ordering of possible positions depends on the test. Whenever possible, it follows a natural order (left to right, top to bottom), but in free-form layouts, indices are arbitrary.
-
stimulus_role
enum -
A stimulus may play different roles within a trial. Below is a list of some possible roles:
Range
- “target”: “A stimulus the subject must process and which should trigger the completion of the response (e.g., classify, reach, memorize) if the subject is doing the task as intended. Note that in some cases (e.g., in a go/no-go task) the correct response to a stimulus is to NOT click the button. In this case, the stimulus that triggered the decision to NOT click the button is still a target.” - “non_target”: “A stimulus the subject must process but which does not trigger the completion of the response (e.g., the first two stimuli in a 2-back test).” - “distractor”: “A stimulus the subject should not process at all (i.e., ignore) and which is unrelated to the correct execution of the task.” - “location_cue”: “A stimulus giving a spatial location information that subjects could use to improve their performance.” - “job_specifier”: “A stimulus specifying which job the subject should perform.” - “stop_signal”: “A stimulus signaling the participant he should abort his current action.” - “probe”: “A stimulus indicating about which stimulus to respond.”
-
stimulus_set_size
integer -
The number of different values each presented stimulus could have taken. This value gives an indication of the complexity of the stimulus space. When this number is large we set this variable to infinity, when for any reason it was not computed, it has a value of NA.
-
stimulus_source
string -
Refers to the specific generator or set the stimulus belongs to. Stimuli that stem from the same source have the same data scheme and could thus be described in a table named after stimulus_source (i.e., stimulus_source indicates which table contains the full information about the stimulus; e.g., “digit1to9”).
-
stimulus_source_type
enum -
A stimulus is typically created using a particular procedure/algorithm (“generator”) or is sampled from a particular set (“set”). This variable indicates which of these two applies for the current stimulus.
Range
- “set”:“stimulus is sampled from a fixed set of stimuli.” - “generator”: “stimulus is created using a procedure/algorithm.”
-
stimulus_structure
enum -
We distinguish three stimulus structures: unitary, set, sequence
Range
unitary: Only one stimulus is shown, alone. -set: Many stimuli are shown, either at the same time or not; order does not matter. -sequence: Multiple stimuli are shown, either at the same time or not; order does matter (order may be indicated by the order of presentation or by a digit for example).
-
stimulus_structure_source
string -
Refers to the specific generator used to produce the stimulus_structure (e.g., sequence of digits in a digit span test). When no generator was used, this variable has a value of none.
-
stimulus_structure_source_type
enum -
Indicates the type of method used to generate the stimulus_structure (this is relevant when a trial displays a sequence of or set of stimuli): none, preset, generator
Range
none: when stimulus_structure == unitary.”, -preset: The structure of stimuli is hard coded in a file. -generator: A procedure was used to generate the stimulus_structure.
-
stimulus_type
enum -
BDM distinguishes the following stimulus types: messages and questions
Range
message: The stimulus is a message shown to subjects (e.g., task instructions). -question: The stimulus may consist of text, images and/or sounds; they require subjects to make a decision based on the content of the stimulus.
-
study_name
string -
The name of the study or experiment.
-
symbol_count
integer -
The number of symbols represented in this component.
-
symbol_layout
enum -
How the symbols are laid out.
Range
vertical: along the Y axis. -horizontal: along the X axis. -diagonal_top_left-diagonal_top_right-square-ring-cross-two_columns
-
symbol_name
string -
The human-readable name of the displayed symbol.
-
task_index
index -
when
multitask_typeis not empty,task_indexrefers to each of the individual tasks. For example, for auditoy-visual dual N-back,task_index=1is the auditory task andtask_index=2is the visual task.Range
1-based index.
-
timeline_id
string -
Timelines are specific parameterization of an instrument and their identifiers are unique within the corresponding table for the instrument in the
instruments/folder. -
timeline_repetition
integer -
The number of times this particular timeline has already been completed anytime in the past by this particular subject in this study. This variable has a value 0 the first time a timeline is completed.
Range
0-based
-
transformation_name
string -
Refers to the specific events-to-trials function used to construct rows of this table from raw events. The transformation (or projection in DDD terminology) embodies the definition of a trial for a particular task. The transformer name refers to a code in the format of a function
f(trial_state, event) => trial_state, whereeventis the event occurred during the performance of the task, andtrial_stateis the data stored for the trial. The final state of a trial is thus the result of applying a sequence of projections such thattrial = f(f(f(initial()), e), e), e).’ - The transformation/projection encapsulates the domain rules that define a “trial” for a given task. It defines what constitutes a “trial”.
-
trial_id
id -
Refers to the
trial_indexin theResponsetable and indicates in which trial this stimulus was shown.Range
trial_idin theResponsetable of the same agent/session/activity/attempt -
trial_index
id -
Sequential identifier representing number of times transformation rule to the events occurred. It increases with each re-computation of the trial based on updated or newly received events.
Range
Preferably 1-based integer index.
-
trial_seed
integerstring -
Random seed used in the trial (if any).
-
trial_start_datetime
datetime -
The the first event of the trial occured.
-
value
float -
A numeric value associated with a particular response option; typically indicating the “worth” of a response (e.g.,
value=1for the correct response). -
version
string -
Refers to the specific version/build of a particular instrument. We will use a calendar based versioning system (calver.org; e.g., “v2024.01”).
-
x_screen
integer -
X coordinates of the stimulus on the screen in pixels.
-
x_viewport
float -
X coordinates of the stimulus on the screen expressed as a fraction of the screen width.
Range
0 to 1 (inclusive)
-
y_screen
integer -
Y coordinates of the stimulus on the screen in pixels.
-
y_viewport
float -
Y coordinates of the stimulus on the screen expressed as a fraction of the screen height.
Range
0 to 1 (inclusive)
-
actor
object -
Who or what performed/experienced the event —
{objectType, id, name?}, where objectType is one of the actor types below. (Renamed fromagent; an Agent is one type of actor — BDM deviation D5.) -
attachments
array -
References to additional files/data associated with the event (stimulus blobs, recording files, timeseries), each with its own metadata. Payloads are not inlined.
-
authority
object -
The authority that generated the event (e.g. the client app/developer). Populated by the LRS.
-
context
object -
Contextual information (study, studyflow, and the session→activity→runtime→block→trial scoping hierarchy) under
context.extensions, keyed bybdm:*extension keys. -
object
object -
What the action was performed on —
{objectType, id, name?}, where objectType is one of the object types below. -
result
object -
The outcome of the event (e.g. accuracy, response_time, score). Domain-specific payload lives under
result.extensionskeyed bybdm:*extension keys. -
stored
datetime (RFC 9557) -
When the event was stored in the LRS. Populated by the LRS.
-
timestamp
datetime (RFC 9557) -
When the event occurred, as an ISO 8601 / RFC 9557 datetime with timezone offset.
-
updated
datetime (RFC 9557) -
When the event was last updated in the LRS. Populated by the LRS.
-
verb
string -
The action that occurred, drawn from the canonical verb vocabulary below.
-
version
string -
The associated BDM/schema version (e.g.
v26.0608). Typically populated by the LRS.
General
Other measures of durations exist and may be useful to describe subjects’ responses. If such additional measures are needed, they should be specified explicitly; for example: response_onset, response_offset, or response_duration.
Units for response times are not consistent across papers and publicly available datasets. One can find them expressed in either seconds or milliseconds. BDM uses seconds as the default unit for response times to: - avoid “exception” by always using seconds as the temporal unit; - avoid additional computation by keeping the units as they currently are in our raw data and task speicifications; - avoid the temptation to round times to integers when expressed in milliseconds; - take advantage of the fact that many popular packages to analyse response time seem to be using seconds as the default unit; - be consistent with what seems to be the default unit in fMRI data standards (e.g., BIDS or DICOMs).
It is tempting to abbreviate response_time as rt. However, there are several other variables prefixed response_ which do not have abbreviations. Spelling the names out, while making the name longer, makes the overall data structure more consistent and explicit.
Demographics
Generic Suffixes
Don’t use length to mean count or size. This is contrary to the terms used in arrays/lists in programming languages.
Avoid the use of size as this term is ambiguous; it could refer to the height of a person, the screen width \(\times\) times height dimensions, or a level within a likert scale (e.g., “Medium”).
“Note that”count” is different from “sum” (e.g., one can sum negative float values while count involves positive integers only) and from “index” (e.g., “this is the second” versus “there are two”).
Avoid the use of n to refer to counts. While using n to refer to counts is much shorter and might be standard in some circles, count is more explicit and less error-prone than n which may mean different things in other contexts (e.g., the length of the variable, an iterator).”
It can be tempting to use synonyms of “type”, in particular when “type” is already used for something else. Such synonyms include things like “category”, “kind” or “set”. When those terms are not required, they should be avoided and replaced by “type”.
Aggregation Suffixes
Don’t use avg or average to refer to the mean value.
Don’t use med to refer to the median.
Don’t use std or SD to refer to the standard deviation.
Don’t use IQR to refer to the interquartile range.
Don’t use total to designate the result of a sum operation.
Use quantiles rather than percentiles because they allow naming the resulting variables in a simpler way. BDM uses the following convention to name the parameter X: - quantile(x, q = 0.23) -> quantile23 - quantile(x, q = 0.145) -> quantile145
Note that quantile(x, q = 1) can not be expressed using this convention. However, quantile(x, q = 1) is in fact equivalent to max(x) which is the preferred expression.
Variables can be sorted (for example from the smallest to the largest values) and some values can be tied (in which case the rank may no longer be represented by integers). Also, it might not be clear if the ranks are descending or ascending (e.g., age_rank). If such confusion arises, it is prefered to use a more explicit name (e.g., youngest_to_oldest or youngest_first_rank).
Transformation Suffixes
Always specify the base when using the log except for the natural log.
Referencing Suffixes
Note that “id” typically implies a context, within which the “id’ is unique. That context must be made explicit. For example, trial_id may identify trials within a trial table for one activity completed by one subject.
If there is a column named id (i.e., without prefix), it is expected to be a primary key and there exists other tables or files that refer to this column; if such a link between tables does not exist, use index or name instead.
The postfix _id does not imply a particular data type: both integers and strings are valid.
*_uuids are expected to be globally unique.
*_uuids are not expected to be human interpretable.
Avoid using _uid suffix to refer to a UUID variable.
Within BDM, string-formatted Version 7 UUIDs are preferred over older versions or corresponding 128-bit integers. For example: 01934efd-35d5-79db-9aca-fc29b0451cd1.
There is no single widespread standard for hashing; rather there are multiple algorithms that can be used depending on the use case. You can use either CRC32 (32 hexadecimal characters; e.g., “098f6bcd4621d373cade4e832627b4f6”) or SHA256 (base64 characters, e.g., “d14a028c2a3a2bc9476102bb288234c415a2b01f828ea62ac5b3e42f”) depending on the probability of collision (i.e., two hashes for different data being identical). When that collision probability is deemed high, use SHA256.
Note that “index” typically implies a context, within which the indexing occurs and that context must be made explicit. For example, trial_index may index trials within a block.
BDM follows the convention of 1-based indexing: always starting counting/indexing from 1 rather than 0.
Avoid the use of *_number because it is ambiguous.
As with index and id, repetition assumes a context which must be clarified when ambiguous.
Repetition is 0-based: it starts “counting” at 0 rather than 1; *_iteration instead of *_repetition would make it 1-based like indices, but it is less explicit and thus less preferred.
Trial fields
See Studyflow table for more details.
For example, 1up-2down. The 1-up, 2-down is a common adaptive staircase procedure used to estimate a subject’s threshold or sensitivity. After a correct response, the difficulty level is increased, and after two incorrect responses, the difficulty level is decreased.
Leave this field empty if no additional measures were recorded for this specific response.
To maintain clarity and consistency, BDM recommends using CSS-style naming conventions for common animations (e.g., “3s linear slide-in”).
Shown as defined in the Stimulus table; also a field of: Option.
In questionnaires, block_index may refer to distinct pages where each page may contain multiple questions.
Shown as defined in the StimulusComponent table; also a field of: OptionComponent.
To maintain clarity and consistency, BDM recommends using CSS-style naming conventions for colors (e.g., “lightgray”).
Shown as defined in the StimulusComponent table; also a field of: OptionComponent.
When the stimulus is shown using an animation, duration covers the complete period between the start of the animation and the end of the animation.
Shown as defined in the Stimulus table; also a field of: Option, Input.
When expected_response_index = 0, it means that the subject should not respond at all.
Sometimes stimuli serve both as stimuli and as response options as subjects have to click on a particular stimuli to give their response (e.g., spatial span, odd one out). It is convenient in those cases to use stimulus_position_index to order/index the options (i.e., option_index == stimulus_position_index) and consequently also the responses.
This list is not exhaustive and characterizing feedback in the future will involve more variables (e.g., separating the type of information shown (e.g., correctness) and how it is shown (“on_option” versus “on_screen”).
NOTE2: We don’t consider here as “feedback”, the kind of feedback that is used in UI to confirm to users that a button has indeed been clicked.
If the presented options have no specific temporal or spatial order, leave this field empty or assign the same index to all options, e.g., index=1.
Shown as defined in the StimulusComponent table; also a field of: OptionComponent.
In addition to index_in_source, stimulus_id can also be used to look up further information about the stimulus in the source.
Shown as defined in the Stimulus table; also a field of: Option.
Shown as defined in the Stimulus table; also a field of: Option.
The type of input_action determines the structure of detailed response data (i.e., mouse-click data is different from key-press data).
For mouse-drag, it corresponds to the number of drag points that have been sampled during the drag-and-drop.
Shown as defined in the Response table; also a field of: Instrument.
Permanent links, e.g., DOIs, are preferred over particular websites.
If no multitasking was involved, leave this field blank.
This characterization of multitasking_type is rudimentary and will likely evolve in the future.
For example, if the same white digit “3” is shown in a digit span sequence, all those instances would have the same object_id although they would have different ids (as they appeared at different times).
Shown as defined in the Stimulus table; also a field of: Option.
Shown as defined in the Stimulus table; also a field of: Option, Input.
Shown as defined in the Response table; also a field of: Option, Input, OptionComponent.
While there is a stimulus_index_in_source to refer to the particular stimulus that was used, we don’t have an equivalent opiton_index_in_source since all options are displayed. Instead, we use response_index and expected_response_index to refer to a particular option within the set of options.
If none of the predefined orientations apply, leave this field empty or use a custom human-readable label. Make sure custom labels are clearly defined in the codebook.
Shown as defined in the StimulusComponent table; also a field of: OptionComponent.
Shown as defined in the Stimulus table; also a field of: Option, OptionComponent.
response_count is different from input_count; a subject may in some cases change their response multiple times before submitting the final response. In such cases, there would be many more inputs than there are components to the final response.
While we have stimulus_set_size we currently don’t have a response_set_size, but we do have option_count and response_count.
response_description can be directly compared to expected_response_description.
This needs to be here rather than in Option table, because the same option can be clicked multiple times and either serve or not for the response depending on the order of the clicks. For example, in the Digit Span test we could have the response of “3;5;7” on a particular trial. This might correspond to > - option.description = [“3”, “4”, “delete”, “delete”, “3”, “5”, “7”, “enter”] > - click.response_element_index = [NA, NA, NA, NA, 1, 2, 3, NA]
This identifier is used by other tables, for example Stimulus table which describes in greater detail the sequence of images shown during a trial, their timing, and visual properties. That table will refer to this id in order to link those descriptions (typically multiple lines in the Stimulus table) to a unique row in the Response table.
Shown as defined in the Response table; also a field of: Stimulus, Option, Input.
response_option_index = 0 means the subject chose none of the options (e.g., a “no-go” response in a Go/No-go task).
response_index can be directly compared to expected_response_index.
response_index refers to an entry in the Option table (i.e., there is no Response table).
Note that the distinction between set and sequence refers to the importance of order information to evaluate if the response is correct or not; a response with a set structure may unfold over time (each piece of information is given in a particular temporal order) and it may be of scientific interest to take into account that order; however, the order itself is not important within the task itself. For example, in the MOT task one may ask subjects to point to all dots that hide a token. If subjects point to all such dots they will be correct no matter in which order the dots were clicked in.
We currently don’t use session_name, session_id and session_repetition in this table.
Stimuli that come from the same source have the same data scheme and could thus be described in a table named after the stimulus_source. stimulus_source indicates which table contains the full information about the stimulus; e.g., “digits1to9”.
One could include a source_count variable here that indicates how many different stimuli there are in the set; but it’s better stored in the table that contains information about that stimulus source.
Shown as defined in the Stimulus table; also a field of: Option.
Shown as defined in the Stimulus table; also a field of: Option.
In some cases, when stimuli are too complex or can’t be precisely described, a summary of all stimuli is given instead.
Shown as defined in the Response table; also a field of: Stimulus, Input, StimulusComponent.
Use semicolon-separated indices if more then one stimulus were presented, e.g., 1;2;3.
It is not because a particular stimulus_source is used in a given timeline that all possible stimuli of that source are displayed to the user. For example, the AX-CPT may use “upper-case-letters” but only use a subset of those letters (e.g., “A”, “B”, “X”, “Y”). Whenever possible, we specify the most relevant/specific set (e.g., “digit1to9” rather than “digit”).
To specify “infinity” in a CSV file we use +Inf and -Inf; these are correctly recognized in R (tidyverse) and Python (pandas) as being valid numbers rather than strings.
Shown as defined in the StimulusComponent table; also a field of: OptionComponent.
If none of the predefined layouts apply, leave this field empty or use a custom human-readable label. Make sure custom labels are clearly defined in the codebook.
Shown as defined in the StimulusComponent table; also a field of: OptionComponent.
Shown as defined in the StimulusComponent table; also a field of: OptionComponent.
Shown as defined in the Response table; also a field of: Instrument.
This field emphasizes order of trials and alignment with projection-based definition of trials. A more complete name would be projection_index or pojected_trial_index. For brevity, BDM uses trial_id instead.
Shown as defined in the Response table; also a field of: Option.
In BDM, the preferred position is the center of the object. However, specific implementations of the tasks may use other locations such as the top-left corner. If this is the case, it should be explicitly stated in the codebook.
Shown as defined in the Stimulus table; also a field of: Option, Input, StimulusComponent, OptionComponent.
Shown as defined in the Stimulus table; also a field of: Option, Input, StimulusComponent, OptionComponent.
Shown as defined in the Stimulus table; also a field of: Option, Input, StimulusComponent, OptionComponent.
Shown as defined in the Stimulus table; also a field of: Option, Input, StimulusComponent, OptionComponent.