Skip to main content

vocabulary

Version: v26.0611 Namespace: https://behaverse.org/schemas/vocabulary

Cross-cutting controlled terminology for behaverse data: general terms, demographics, and the suffix conventions used in variable names. Terms that one schema owns (e.g. trial table fields) live in that schema; this resource holds the terms no single schema owns.

This controlled vocabulary defines 39 terms across 6 concept schemes. Terms a single schema owns live in that schema; this resource holds the cross-cutting terms no single schema owns.

General

General vocabulary

TermTypeDefinition
accuracyfloatRefers to a measure of performance. In many behavioral tasks, it reflects the percentage (0-100%) or fraction (0-1) of correct responses. Always use accuracy to refer to a performance measure that is a real number (float) and bounded to the [0-1] range.
correctbooleanA boolean which indicates whether a response in a given trial was correct or not. When no response was given when it should (i.e., timeout), correct evaluates to FALSE rather than N/A. This is to avoid the case where subjects would be given a high performance score when in fact they avoided all difficult trials and responded correctly only to easy trials.
response_timefloatThe meaning of response time or reaction time (and its unit) is not consistent across studies. In BDM, response_time is the duration in seconds between a) the moment the subjects fully completed their response on a given trial, and b) the moment that the earliest possible correct response could have been completed by a hypothetical agent with perfect knowledge of the task and ability to instantaneously execute the response.
timed_outbooleanIndicates whether the subject failed to respond within the allocated time period.

Demographics

Demographics vocabulary

TermTypeDefinition
agefloatAge is typically expressed in years. However, we don't recommend rounding "age" to get integer values, as rounding implies losing data. It is better to leave variables as real numbers (floats when they are floats) and let the data analysts decide whether or not rounding this variable is necessary for their specific use case.
gender/sexenumGender and sex are not exactly the same. Sex refers to a biological sex while gender is a more complex construct. A person may have a male biological sex but identify as a women for example. Depending on the question asked, the variable should therefore be either sex or gender. For example, "What sex were you assigned at birth, such as on an original birth certificate?" is a question about biological sex and should be coded as sex. The possible values for sex are:

Generic Suffixes

Variables may describe a feature or property of an entity, using the format <entity>_<feature>. All names below are not to be used alone but rather as suffixes (e.g., "block_type", "stimulus_description").

TermTypeDefinition
lengthfloatRefers to the length in centimeters of a physical object. When possible use a more specific word (e.g., height, width, distance).
heightfloatRefers to the height of a physical object in centimeters.
widthfloatRefers to the width of a physical object in centimeters.
weightfloatRefers to the weight of a physical object in kilogram.
*_countintegerRefers to the cardinality of that entity. A variable named page_count indicates the number of pages. Or, if an observation/row has car_count = 5 this means that this particular observation involves a total count of 5 cars; this 5 is unrelated to other rows in the table.
typeenumType is always an enum with known values. The meaning of the particular enum value needs to be explained in a codebook.
descriptionstringDescription is always a text (string) for human consumption. While it is not strictly necessary, a textual description can greatly facilitate the understanding and processing of the data by humans.

Aggregation Suffixes

Use the following terms as suffixes to refer to particular operations in variable names. For example the mean of a variable "age" would be called "age_mean" (and not for example age_avg or age_m). We typically use the same name as the aggregation function name in Python or R. More specialized terms require explicit descriptions.

TermTypeDefinition
meanfloatThe average of a numeric variable.
medianfloatThe median of the variable.
modeThe mode of a variable.
minThe minimum value of a variable.
maxThe maximum value of a variable.
sdThe standard deviation of a variable.
varThe variance of a variable.
iqrThe interquartile range of a variable.
sumfloatThe sum of all values of a variable (e.g., item_price_sum = sum(item_price)).
quantile*floatQuantile is similar to percentile, as both refer to the value of a parameter Q that splits the data such that a given fraction of the data is smaller than Q. Quantile expresses that fraction as a number between 0 and 1 while percentiles express it as a percentage (between 0 and 100).
rankintegerRank of a value in a set (ascending or first to last).

Transformation Suffixes

Use the following terms to refer to particular operations in variable names. In general, use the function name that was used to do the transformation. For example the log of a variable age would be called age_log (and not for example log_age or age_in_log). We typically use the same name as the transformation function name in Python or R.

TermTypeDefinition
logfloatNatural log.
log2floatLog of base 2.
log10floatLog of base 10.
sqrtSquare root.
pow2Power of 2.
floorintegerFlooring of a number (e.g., 3.6 becomes 3).
ceilintegerCeiling of a number (e.g., 3.6 becomes 4).
roundintegerRounding of a number to the closest integer (e.g., 3.6 becomes 4).

Referencing Suffixes

Several variables are used to filter, identify or locate particular rows in a table or across multiple tables. Below is a list of the ones that are usually used in behavioral data and how they can be used.

TermTypeDefinition
*_id`stringinteger`
*_namestringSometimes "name" is used in a way that is similar to a unique id (e.g., study_name or task_name). The difference between "id" and "name" is that "name" is expected to be a readable text (e.g., n-back versus f346-r23v). As with "id", it is expected that it refers to other tables and that it is unique within a certain context (contrary to, for example, "label").
*_uuidstringUniversally Unique Identifier (UUID) is a random 32-digit label that can be generated on the fly and will most likely be unique in computer systems. UUID can be used to assign a record a unique identifier without having to ensure that that number is not yet used by some other records or tables.
*_hashstringIt is sometimes useful to create a reproducible keys based on some data. A hash is not strictly necessary as it can be recreated using different data but it can be convenient for data processing.
*_indexintegerIndices should be favored over labels and ids when a variable is used for referencing and when order is important (often, but not always, the chronological order). For example, a variable named stimulus_position_index implies its value points to an entry in a list of possible stimulus positions.
*_repetitionintegerRepetition counts the number of times the same "thing" occurred, e.g., a participant completes the same test twice, the same stimulus appears multiple times.
*_labelstringA text attached to a variable and identifies it. It is expected to be human readable, but not always unique.