Specification

Formal definition and grammar

Studyflow is a domain-specific language for specifying scientific processes and their associated data. It extends the BPMN 2.1 standard to fit the specific needs of experimental sciences.

Formal definition

A studyflow diagram is a \(S = (N, E, T, \tau, \lambda)\) tuple, where \(N\) is a finite set of elements, \(E\subseteq N\times N\) represents sequence flows (edges), \(T\) is a set of pre-defined node types (events, activities, gateways, data), \(\tau: N \rightarrow T\) is a typing function that assigned types (events, activities, gateways, data) to the nodes, and \(\lambda\) is a labeling function that assigns additional attributes to the nodes (e.g., metadata, triggers, gateway logic, implementation). The elements, \(N\), are connected by directed edges, \(E\), forming a directed graph that represents the flow of the study.

\(N\) can be further divided into subsets based on the type of elements (\(T\)). For example, \(N_{E} \subseteq N\) represents the set of events (e.g., start and end events), \(N_{A} \subseteq N\) represents the set of activities (e.g., tasks, sub-processes), and \(N_{G} \subseteq N\) represents the set of gateways (e.g., randomizer, decision points, parallel splits). \(N_{D} \subseteq N\) represents data objects that can be used to store and manipulate data within the studyflow. Each subset has its own specific attributes and behaviors defined by the \(\lambda\) function.

The main components of the \(S\) tuple are described in the BPMN 2.1 specification, and studyflow extends them with additional types and attributes to better suit the needs of experimental studies. More specifically:

  • \(N_A\) (activities) is extended with specific activity types relevant to experimental studies, such as cognitive tests, questionnaires, instructions, rest periods, video games, and standardized Behaverse tasks. The abstract DataOperationActivity augments any BPMN activity with the data-operation marker and input/output variable lists.
  • \(N_G\) (gateways) is extended with a random gateway type (RandomGateway) for random assignment, a stratified variant (StratifiedAllocationGateway) that balances allocation across covariate strata, and an eligibility decision gateway (EligibilityGateway) that encodes inclusion/exclusion criteria.
  • \(N_D\) (data objects) can be used to represent data collected during the study, such as participant responses, physiological measurements, or other relevant data. It also supports standard data formats (e.g., BIDS, BDM, Psych-DS, Kedro) and the related infrastructure types: DataCatalog, DataStorage, Dataset, Schema, Array, and Snapshot.
  • \(\lambda\) (attributes) is extended to include attributes specific to experimental studies or data analysis, such as metadata (e.g., study name, version), event triggers (e.g., temporal, errors), gateway logic (e.g., randomization probabilities, conditional logics), and implementation details (e.g., links to external scripts or software). The studyflow BaseElement augments every element with documentation (markdown) and a checklist.
  • \(E\) (edges) can include group assignments, indicating which paths participants should follow based on their assigned group. SequenceFlow carries an optional conditionExpression for gated branches.
  • \(S\) can also include design patterns commonly used in experimental studies, such as counterbalancing, recruitment, exception handling, and data quality checks.

Grammar

The grammar below defines the structure of a studyflow diagram using the EBNF notation (included for reference only).

Studyflow EBNF grammar (click to expand)

/* ========== Top level ========== */

Definitions       ::= Study*
Study             ::= 'Study' Identifier Attribute* (Element | SequenceFlow)*
SubProcess        ::= 'SubProcess' Identifier Attribute* (Element | SequenceFlow | DataAssociation)*
Element           ::= Event | Activity | Gateway | SubProcess
                    | DataObject | DataCatalog | DataStorage | Dataset
                    | Schema | Array | Snapshot

/* ========== Events ========== */

Event             ::= StartEvent | EndEvent
StartEvent        ::= 'StartEvent' Identifier Attribute*    /* may carry consentFormUri */
EndEvent          ::= 'EndEvent' Identifier Attribute*      /* may carry redirectTo, completionCodeType, completionCode */

/* ========== Activities ========== */

Activity              ::= 'Activity' Identifier ActivityAttributeList Choreography?
ActivityAttributeList ::= ActivityType ActivityAttribute*
ActivityType          ::= '@type' ('CognitiveTask' | 'Questionnaire' | 'Instruction' |
                                   'Rest' | 'VideoGame' | 'BehaverseTask' |
                                   'Script' | 'Manual')
ActivityAttribute     ::= Attribute | DataInput | DataOutput | DataOperation

/* ========== Data transformations ========== */

DataOperation      ::= OpClause+
OpClause           ::= '@op' (PrimitiveOp | CompositeOp) DataInput+ DataOutput+
PrimitiveOp        ::= 'Transform' | 'Map' | 'Filter' | 'FlatMap' | 'Reduce' | 'Group'
CompositeOp        ::= 'Compose' PrimitiveOp+

/* ========== Data elements ========== */

DataCatalog        ::= 'DataCatalog' Identifier Attribute*               /* url */
DataStorage        ::= 'DataStorage' Identifier Attribute*               /* persistent physical store */
Dataset            ::= 'Dataset' Identifier Attribute*                   /* catalog, storage, schema, format */
Schema             ::= 'Schema' Identifier Attribute*                    /* format, body */
Array              ::= 'Array' Identifier Attribute*                     /* dataset, schema */
Snapshot           ::= 'Snapshot' Identifier Attribute*                  /* source, version */
DataObject         ::= 'DataObject' Identifier Attribute*                /* may carry state */
DataInput          ::= '@in' NodeRef
DataOutput         ::= '@out' NodeRef

/* ========== Choreography ========== */

Choreography     ::= 'Choreography' Attribute* ParticipantRef+ InitiatingParticipant? MessageFlowList
ParticipantRef   ::= ProcessRef
InitiatingParticipant ::= ParticipantRef
MessageFlowList  ::= MessageFlow*
MessageFlow      ::= 'MessageFlow' Identifier Attribute* ParticipantRef '->' ParticipantRef

/* Gateway definitions */
Gateway          ::= 'Gateway' Identifier GatewayAttribute*
GatewayAttribute ::= GatewayType | Attribute
GatewayType      ::= '@type' ('Random' | 'StratifiedAllocation' | 'Eligibility' |
                              'Exclusive' | 'Parallel' | 'Inclusive' | 'Complex')
SequenceFlow     ::= 'SequenceFlow' Identifier Attribute* NodeRef '->' NodeRef
                                                            /* optional conditionExpression */

/* Common definitions */
Attribute             ::= Identifier Value
ProcessRef            ::= Identifier
NodeRef               ::= Identifier

/* Basic value types */
Boolean    ::= 'true' | 'false'
Value      ::= String | Number | Boolean | Identifier
Number     ::= '-'? ( [0-9]+ ('.' [0-9]*)? | '.' [0-9]+ )
String     ::= '"' [^"]* '"'
Identifier ::= [A-Za-z] [A-Za-z0-9_]*

An example studyflow in this formalism is shown below:

Example studyflow (click to expand)
Study exampleStudy

  StartEvent s
    consentFormUri "https://example.org/consent.pdf"

  Activity qs
    @type Questionnaire
    instrument phq-9

  Gateway gw
    @type Random
    algorithm probabilistic
    probabilityFunction uniform

  Activity instr
    @type Instruction
    content "Follow carefully"

  Activity rest
    @type Rest
    configurations "duration: 5"

  EndEvent e
    redirectTo "https://app.prolific.com/submissions/complete?cc={COMPLETION_CODE}"
    completionCodeType static
    completionCode "ABCD1234"

  SequenceFlow f1 s -> qs
  SequenceFlow f2 qs -> gw
  SequenceFlow f3 gw -> instr
  SequenceFlow f4 gw -> e
  SequenceFlow f5 instr -> rest
  SequenceFlow f6 rest -> e

Which can be visualized as an extended BPMN diagram:

The same studyflow visualized

This diagram can also be represented in machine-readable formats. Cognitive elements (Questionnaire, Instruction, Rest, CognitiveTask, VideoGame, RandomGateway, …) live in the cognitive namespace; data infrastructure and the Study/StartEvent/EndEvent extensions live in the core studyflow schema (namespace URI http://behaverse.org/schemas/studyflow/v1; the version segment identifies the studyflow format version, and the unversioned URI written by older releases is accepted and rewritten on load). Concrete cognitive activities are serialized as a standard BPMN task carrying the schema-specific extension element inside extensionElements.

The native .studyflow file format is YAML: the diagram id sits at the root (id), the remaining diagram metadata under definitions, and every other top-level entry is a BPMN root element keyed by its id. Element collections (flowElements, participants, lanes, …) are likewise mappings keyed by element id, and diagram geometry (bounds, label, waypoint) sits inline on the element it describes – a top-level diagram key appears only for diagram-interchange data that cannot be attached to an element (this example carries no layout at all). Values equal to a schema default are omitted on save and re-applied on load: the RandomGateway below carries no algorithm/probabilityFunction keys because this example uses the defaults (probabilistic, uniform). incoming/outgoing may be omitted when authoring by hand; they are derived from the sequence flows on load.

YAML serialization – the native .studyflow format (click to expand)
id: example-diagram
definitions:
  xmlns:bpmn: http://www.omg.org/spec/BPMN/20100524/MODEL
  xmlns:studyflow: http://behaverse.org/schemas/studyflow/v1
  xmlns:cognitive: http://behaverse.org/schemas/studyflow/cognitive
exampleStudy:
  type: bpmn:Process
  extensionElements:
    - type: studyflow:Study
  flowElements:
    s:
      type: bpmn:StartEvent
      name: s
      outgoing:
        - f1
      consentFormUri: https://example.org/consent.pdf
    qs:
      type: bpmn:Task
      extensionElements:
        - type: cognitive:Questionnaire
          instrument: phq-9
      name: qs
      incoming:
        - f1
      outgoing:
        - f2
    f1:
      type: bpmn:SequenceFlow
      sourceRef: s
      targetRef: qs
    gw:
      type: bpmn:ExclusiveGateway
      extensionElements:
        - type: cognitive:RandomGateway
      name: gw
      incoming:
        - f2
      outgoing:
        - f3
        - f4
    f2:
      type: bpmn:SequenceFlow
      sourceRef: qs
      targetRef: gw
    instr:
      type: bpmn:Task
      extensionElements:
        - type: cognitive:Instruction
          content: Follow carefully
      name: instr
      incoming:
        - f3
      outgoing:
        - f5
    f3:
      type: bpmn:SequenceFlow
      sourceRef: gw
      targetRef: instr
    rest:
      type: bpmn:Task
      extensionElements:
        - type: cognitive:Rest
          configurations:
            duration: 5
      name: rest
      incoming:
        - f5
      outgoing:
        - f6
    f5:
      type: bpmn:SequenceFlow
      sourceRef: instr
      targetRef: rest
    e:
      type: bpmn:EndEvent
      name: e
      incoming:
        - f6
        - f4
      redirectTo: https://app.prolific.com/submissions/complete?cc={COMPLETION_CODE}
      completionCodeType: static
      completionCode: ABCD1234
    f6:
      type: bpmn:SequenceFlow
      sourceRef: rest
      targetRef: e
    f4:
      type: bpmn:SequenceFlow
      sourceRef: gw
      targetRef: e

The same model also serializes to standard BPMN 2.0 XML (File → Save As → BPMN 2.0 XML in the modeler) for interop with other BPMN tooling. Legacy .studyflow files use this XML form and still open in both the modeler and the runner. Note the two extension styles: new element types nest a wrapper element inside extensionElements (<cognitive:questionnaire>), while extensions of standard BPMN elements appear as namespaced attributes on the host element (studyflow:consentFormUri on <bpmn:startEvent>).

BPMN 2.0 XML serialization (click to expand)
<?xml version="1.0" encoding="UTF-8"?>
<bpmn:definitions xmlns:bpmn="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:studyflow="http://behaverse.org/schemas/studyflow" xmlns:cognitive="http://behaverse.org/schemas/studyflow/cognitive" id="example-diagram">
  <bpmn:process id="exampleStudy">
    <bpmn:extensionElements>
      <studyflow:study />
    </bpmn:extensionElements>
    <bpmn:startEvent id="s" name="s" studyflow:consentFormUri="https://example.org/consent.pdf">
      <bpmn:outgoing>f1</bpmn:outgoing>
    </bpmn:startEvent>
    <bpmn:task id="qs" name="qs">
      <bpmn:extensionElements>
        <cognitive:questionnaire instrument="phq-9" />
      </bpmn:extensionElements>
      <bpmn:incoming>f1</bpmn:incoming>
      <bpmn:outgoing>f2</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="f1" sourceRef="s" targetRef="qs" />
    <bpmn:exclusiveGateway id="gw" name="gw">
      <bpmn:extensionElements>
        <cognitive:randomGateway />
      </bpmn:extensionElements>
      <bpmn:incoming>f2</bpmn:incoming>
      <bpmn:outgoing>f3</bpmn:outgoing>
      <bpmn:outgoing>f4</bpmn:outgoing>
    </bpmn:exclusiveGateway>
    <bpmn:sequenceFlow id="f2" sourceRef="qs" targetRef="gw" />
    <bpmn:task id="instr" name="instr">
      <bpmn:extensionElements>
        <cognitive:instruction content="Follow carefully" />
      </bpmn:extensionElements>
      <bpmn:incoming>f3</bpmn:incoming>
      <bpmn:outgoing>f5</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="f3" sourceRef="gw" targetRef="instr" />
    <bpmn:task id="rest" name="rest">
      <bpmn:extensionElements>
        <cognitive:rest>
          <cognitive:configurations>duration: 5</cognitive:configurations>
        </cognitive:rest>
      </bpmn:extensionElements>
      <bpmn:incoming>f5</bpmn:incoming>
      <bpmn:outgoing>f6</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="f5" sourceRef="instr" targetRef="rest" />
    <bpmn:endEvent id="e" name="e" studyflow:redirectTo="https://app.prolific.com/submissions/complete?cc={COMPLETION_CODE}" studyflow:completionCodeType="static" studyflow:completionCode="ABCD1234">
      <bpmn:incoming>f6</bpmn:incoming>
      <bpmn:incoming>f4</bpmn:incoming>
    </bpmn:endEvent>
    <bpmn:sequenceFlow id="f6" sourceRef="rest" targetRef="e" />
    <bpmn:sequenceFlow id="f4" sourceRef="gw" targetRef="e" />
  </bpmn:process>
</bpmn:definitions>