Indices
To interpret a given observation (e.g., a response to a questionnaire) we need contextual information. In particular, we want to know when within the course of a study that particular observation took place. There are several kinds of variables one may use to provide such information. For example, datetime
variables maybe used to record the date and time of a particular observation. While datetime
is a critical piece of information, it is also typically insufficient or impractical. Instead, it may be desirable to use index
variables to encode for example, that a given response was the third response to a particular questionnaire on the second day of testing (which is common practice).
The behaverse data model makes use of several such index variables. For consistency it is critical to define what each of those variables means and how these index variables are nested within each other.
subject_index
: used to index subjects within a dataset (which typically corresponds to one study). It is advised but not required that subjects are ordered chronologically. Note that subject_index must be be unique within a dataset in the sense that it should not be possible for two distinct people to be assigned the same subject_index value within a dataset.
session_index
: some studies divide data collection into multiple sessions. A session refers to a temporally contiguous period of data collection. Two sessions are temporally separate and non-overlapping (one session can start only after the previous one ended). It is typically, but not always, the case that distinct sessions occur on distinct days.
Note we do NOT use the term “session” to refer to software related meanings (e.g., continuous period of connection to a server) as these may be of no use for behavioral data scientists to interpret the data (although they are likely useful for software engineers).
session_index is nested within subject_index. This means that when a subject starts a new session their session_index will be incremented; however, their session_index is unaffected by other subjects’ data completion journeys.
response_index
: subjects emitt responses. These may be complex sequences of inputs (e.g., typing an answer) that are interpreted within a study as a meaningful unit of observation.
A subject will emitt responses over the course of their participation in the study. response_index is nested within subjects, not within any lower level construct. This means that we increment our response counter starting from a subjects’ first response to their least response within a study without ever resetting that counter. The maximum value of response_index for a given subject reflects how many responses were recorded from that subject in the study.
episode_index
: refers to successive, mututually exclusive time periods within a specific subject’s experience within a study. While episode_index is often redundant with response_index, there are cases where they are not. For example, in some multitask settings a subject may emitt two distinct responses (approximatly) at the same time (or within the same “trial”); each of these responses would have its own index but because they occured at the same time they would be associted to the same episode_index value.
activity_index
: it is common for subjects to engage in multiple sequential activities during a study (e.g., first complete a demographics questionnaire, then a cognitive test, then a debreif questionnaire). In such cases it is useful to index each activity to keep track of their order. activity_index indexes activities within subject_index meaning that we increment each subject’s activity counter from their first activity until the last one without ever resetting the counter. The maximum value of activity_index for a given subject relfects the total number of activities completed by that subject (the repetition of an activity is counted as a separate activity).
The next group of indecis are tied to specific activities and are thus nested within activities (and subjects).
block_index
: an activity may consist of several blocks of groups of trials or questions. In a cognitive test for example, subject may be asked to complete 4 blocks of 50 “trials” where each block is separated by a short break. In a questionnaire, multiple questions may be displayed at once on the screen or page. block_index refers to such groups of responses; they are indexed within activity, meaning that block_index is reset to a value of 1 each time a subject starts a new activity.
task_index
: an activity may involve completing multiple tasks (for example, identifying shapes presented at the center of the screen and detect peripheral visual flashes). In such cases it is useful to have an index to refer to each of the tasks within an activity. Note that tasks may also be identified by their job_description
.
trial_index
: we use the term trial to mean instances of a particular statistical experiment. trial_index are nested within task_index (and activity_index and subject_index). When multiple tasks are performed within an activity, each task has its own trial counter. episode_index indicates which trials occured at the smae time. The first time a subject completes a trial from a specific task within an activity their trial_index for that task and activity has a value of one; at the end of the activity, the maximum value of trial_index reflects the total number of trials for a partricular task that the subject completed during an activity (i.e., trial_index is continuously incremented during an activity and is not reset each time a new block starts).
stimulus_index
: in some cases it may be useful to have an index to refer to the stimuli (e.g., images or questions in a questionnaire) ujsed within an activity. For example, a questionnaire may display a set of questions on the screen and subjects may be allowed to answer those questions in any order. The order of the questions on the page could be encoded by the stimulus_index variable while the order of the responses given by the subject would be encoded in the response_index.
The index variables presented above provide critical information to understand the temporal context of a specific observation. In addition, we also use index variables to refer to data stored in supplementary tables. For example, response_option_index
is a reference to a particular entry in the Option table which describes the specific option a subject chose on a specfic trial (see spec for more details.)