Specification
Studyflow is a domain-specific language for specifying scientific processes and their associated data. It extends the BPMN 2.1 standard to fit the specific needs of experimental sciences.
Formal definition
A studyflow diagram is a \(S = (N, E, T, \tau, \lambda)\) tuple, where \(N\) is a finite set of elements, \(E\subseteq N\times N\) represents sequence flows (edges), \(T\) is a set of pre-defined node types (events, activities, gateways, data), \(\tau: N \rightarrow T\) is a typing function that assigned types (events, activities, gateways, data) to the nodes, and \(\lambda\) is a labeling function that assigns additional attributes to the nodes (e.g., metadata, triggers, gateway logic, implementation). The elements, \(N\), are connected by directed edges, \(E\), forming a directed graph that represents the flow of the study.
\(N\) can be further divided into subsets based on the type of elements (\(T\)). For example, \(N_{E} \subseteq N\) represents the set of events (e.g., start and end events), \(N_{A} \subseteq N\) represents the set of activities (e.g., tasks, sub-processes), and \(N_{G} \subseteq N\) represents the set of gateways (e.g., randomizer, decision points, parallel splits). \(N_{D} \subseteq N\) represents data objects that can be used to store and manipulate data within the studyflow. Each subset has its own specific attributes and behaviors defined by the \(\lambda\) function.
The main components of the \(S\) tuple are described in the BPMN 2.1 specification, and studyflow extends them with additional types and attributes to better suit the needs of experimental studies. More specifically:
- \(N_A\) (activities) is extended with specific activity types relevant to experimental studies, such as cognitive tests, questionnaires, instructions, rest periods, video games, and standardized Behaverse tasks. The abstract
DataOperationActivityaugments any BPMN activity with the data-operation marker and input/output variable lists. - \(N_G\) (gateways) is extended with a random gateway type (
RandomGateway) for random assignment, a stratified variant (StratifiedAllocationGateway) that balances allocation across covariate strata, and an eligibility decision gateway (EligibilityGateway) that encodes inclusion/exclusion criteria. - \(N_D\) (data objects) can be used to represent data collected during the study, such as participant responses, physiological measurements, or other relevant data. It also supports standard data formats (e.g., BIDS, BDM, Psych-DS, Kedro) and the related infrastructure types:
DataCatalog,DataStorage,Dataset,Schema,Array, andSnapshot. - \(\lambda\) (attributes) is extended to include attributes specific to experimental studies or data analysis, such as metadata (e.g., study name, version), event triggers (e.g., temporal, errors), gateway logic (e.g., randomization probabilities, conditional logics), and implementation details (e.g., links to external scripts or software). The studyflow
BaseElementaugments every element withdocumentation(markdown) and achecklist. - \(E\) (edges) can include group assignments, indicating which paths participants should follow based on their assigned group.
SequenceFlowcarries an optionalconditionExpressionfor gated branches. - \(S\) can also include design patterns commonly used in experimental studies, such as counterbalancing, recruitment, exception handling, and data quality checks.
Grammar
The grammar below defines the structure of a studyflow diagram using the EBNF notation (included for reference only).
Studyflow EBNF grammar (click to expand)
/* ========== Top level ========== */
Definitions ::= Study*
Study ::= 'Study' Identifier Attribute* (Element | SequenceFlow)*
SubProcess ::= 'SubProcess' Identifier Attribute* (Element | SequenceFlow | DataAssociation)*
Element ::= Event | Activity | Gateway | SubProcess
| DataObject | DataCatalog | DataStorage | Dataset
| Schema | Array | Snapshot
/* ========== Events ========== */
Event ::= StartEvent | EndEvent
StartEvent ::= 'StartEvent' Identifier Attribute* /* may carry consentFormUri */
EndEvent ::= 'EndEvent' Identifier Attribute* /* may carry redirectTo, completionCodeType, completionCode */
/* ========== Activities ========== */
Activity ::= 'Activity' Identifier ActivityAttributeList Choreography?
ActivityAttributeList ::= ActivityType ActivityAttribute*
ActivityType ::= '@type' ('CognitiveTask' | 'Questionnaire' | 'Instruction' |
'Rest' | 'VideoGame' | 'BehaverseTask' |
'Script' | 'Manual')
ActivityAttribute ::= Attribute | DataInput | DataOutput | DataOperation
/* ========== Data transformations ========== */
DataOperation ::= OpClause+
OpClause ::= '@op' (PrimitiveOp | CompositeOp) DataInput+ DataOutput+
PrimitiveOp ::= 'Transform' | 'Map' | 'Filter' | 'FlatMap' | 'Reduce' | 'Group'
CompositeOp ::= 'Compose' PrimitiveOp+
/* ========== Data elements ========== */
DataCatalog ::= 'DataCatalog' Identifier Attribute* /* url */
DataStorage ::= 'DataStorage' Identifier Attribute* /* persistent physical store */
Dataset ::= 'Dataset' Identifier Attribute* /* catalog, storage, schema, format */
Schema ::= 'Schema' Identifier Attribute* /* format, body */
Array ::= 'Array' Identifier Attribute* /* dataset, schema */
Snapshot ::= 'Snapshot' Identifier Attribute* /* source, version */
DataObject ::= 'DataObject' Identifier Attribute* /* may carry state */
DataInput ::= '@in' NodeRef
DataOutput ::= '@out' NodeRef
/* ========== Choreography ========== */
Choreography ::= 'Choreography' Attribute* ParticipantRef+ InitiatingParticipant? MessageFlowList
ParticipantRef ::= ProcessRef
InitiatingParticipant ::= ParticipantRef
MessageFlowList ::= MessageFlow*
MessageFlow ::= 'MessageFlow' Identifier Attribute* ParticipantRef '->' ParticipantRef
/* Gateway definitions */
Gateway ::= 'Gateway' Identifier GatewayAttribute*
GatewayAttribute ::= GatewayType | Attribute
GatewayType ::= '@type' ('Random' | 'StratifiedAllocation' | 'Eligibility' |
'Exclusive' | 'Parallel' | 'Inclusive' | 'Complex')
SequenceFlow ::= 'SequenceFlow' Identifier Attribute* NodeRef '->' NodeRef
/* optional conditionExpression */
/* Common definitions */
Attribute ::= Identifier Value
ProcessRef ::= Identifier
NodeRef ::= Identifier
/* Basic value types */
Boolean ::= 'true' | 'false'
Value ::= String | Number | Boolean | Identifier
Number ::= '-'? ( [0-9]+ ('.' [0-9]*)? | '.' [0-9]+ )
String ::= '"' [^"]* '"'
Identifier ::= [A-Za-z] [A-Za-z0-9_]*
An example studyflow in this formalism is shown below:
Example studyflow (click to expand)
Study exampleStudy
StartEvent s
consentFormUri "https://example.org/consent.pdf"
Activity qs
@type Questionnaire
instrument phq-9
Gateway gw
@type Random
algorithm probabilistic
probabilityFunction uniform
Activity instr
@type Instruction
content "Follow carefully"
Activity rest
@type Rest
configurations "duration: 5"
EndEvent e
redirectTo "https://app.prolific.com/submissions/complete?cc={COMPLETION_CODE}"
completionCodeType static
completionCode "ABCD1234"
SequenceFlow f1 s -> qs
SequenceFlow f2 qs -> gw
SequenceFlow f3 gw -> instr
SequenceFlow f4 gw -> e
SequenceFlow f5 instr -> rest
SequenceFlow f6 rest -> eWhich can be visualized as an extended BPMN diagram:
This diagram can also be represented in machine-readable formats. Cognitive elements (Questionnaire, Instruction, Rest, CognitiveTask, VideoGame, RandomGateway, …) live in the cognitive namespace; data infrastructure and the Study/StartEvent/EndEvent extensions live in the core studyflow schema (namespace URI http://behaverse.org/schemas/studyflow/v1; the version segment identifies the studyflow format version, and the unversioned URI written by older releases is accepted and rewritten on load). Concrete cognitive activities are serialized as a standard BPMN task carrying the schema-specific extension element inside extensionElements.
The native .studyflow file format is YAML: the diagram id sits at the root (id), the remaining diagram metadata under definitions, and every other top-level entry is a BPMN root element keyed by its id. Element collections (flowElements, participants, lanes, …) are likewise mappings keyed by element id, and diagram geometry (bounds, label, waypoint) sits inline on the element it describes – a top-level diagram key appears only for diagram-interchange data that cannot be attached to an element (this example carries no layout at all). Values equal to a schema default are omitted on save and re-applied on load: the RandomGateway below carries no algorithm/probabilityFunction keys because this example uses the defaults (probabilistic, uniform). incoming/outgoing may be omitted when authoring by hand; they are derived from the sequence flows on load.
YAML serialization – the native .studyflow format (click to expand)
id: example-diagram
definitions:
xmlns:bpmn: http://www.omg.org/spec/BPMN/20100524/MODEL
xmlns:studyflow: http://behaverse.org/schemas/studyflow/v1
xmlns:cognitive: http://behaverse.org/schemas/studyflow/cognitive
exampleStudy:
type: bpmn:Process
extensionElements:
- type: studyflow:Study
flowElements:
s:
type: bpmn:StartEvent
name: s
outgoing:
- f1
consentFormUri: https://example.org/consent.pdf
qs:
type: bpmn:Task
extensionElements:
- type: cognitive:Questionnaire
instrument: phq-9
name: qs
incoming:
- f1
outgoing:
- f2
f1:
type: bpmn:SequenceFlow
sourceRef: s
targetRef: qs
gw:
type: bpmn:ExclusiveGateway
extensionElements:
- type: cognitive:RandomGateway
name: gw
incoming:
- f2
outgoing:
- f3
- f4
f2:
type: bpmn:SequenceFlow
sourceRef: qs
targetRef: gw
instr:
type: bpmn:Task
extensionElements:
- type: cognitive:Instruction
content: Follow carefully
name: instr
incoming:
- f3
outgoing:
- f5
f3:
type: bpmn:SequenceFlow
sourceRef: gw
targetRef: instr
rest:
type: bpmn:Task
extensionElements:
- type: cognitive:Rest
configurations:
duration: 5
name: rest
incoming:
- f5
outgoing:
- f6
f5:
type: bpmn:SequenceFlow
sourceRef: instr
targetRef: rest
e:
type: bpmn:EndEvent
name: e
incoming:
- f6
- f4
redirectTo: https://app.prolific.com/submissions/complete?cc={COMPLETION_CODE}
completionCodeType: static
completionCode: ABCD1234
f6:
type: bpmn:SequenceFlow
sourceRef: rest
targetRef: e
f4:
type: bpmn:SequenceFlow
sourceRef: gw
targetRef: eThe same model also serializes to standard BPMN 2.0 XML (File → Save As → BPMN 2.0 XML in the modeler) for interop with other BPMN tooling. Legacy .studyflow files use this XML form and still open in both the modeler and the runner. Note the two extension styles: new element types nest a wrapper element inside extensionElements (<cognitive:questionnaire>), while extensions of standard BPMN elements appear as namespaced attributes on the host element (studyflow:consentFormUri on <bpmn:startEvent>).
BPMN 2.0 XML serialization (click to expand)
<?xml version="1.0" encoding="UTF-8"?>
<bpmn:definitions xmlns:bpmn="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:studyflow="http://behaverse.org/schemas/studyflow" xmlns:cognitive="http://behaverse.org/schemas/studyflow/cognitive" id="example-diagram">
<bpmn:process id="exampleStudy">
<bpmn:extensionElements>
<studyflow:study />
</bpmn:extensionElements>
<bpmn:startEvent id="s" name="s" studyflow:consentFormUri="https://example.org/consent.pdf">
<bpmn:outgoing>f1</bpmn:outgoing>
</bpmn:startEvent>
<bpmn:task id="qs" name="qs">
<bpmn:extensionElements>
<cognitive:questionnaire instrument="phq-9" />
</bpmn:extensionElements>
<bpmn:incoming>f1</bpmn:incoming>
<bpmn:outgoing>f2</bpmn:outgoing>
</bpmn:task>
<bpmn:sequenceFlow id="f1" sourceRef="s" targetRef="qs" />
<bpmn:exclusiveGateway id="gw" name="gw">
<bpmn:extensionElements>
<cognitive:randomGateway />
</bpmn:extensionElements>
<bpmn:incoming>f2</bpmn:incoming>
<bpmn:outgoing>f3</bpmn:outgoing>
<bpmn:outgoing>f4</bpmn:outgoing>
</bpmn:exclusiveGateway>
<bpmn:sequenceFlow id="f2" sourceRef="qs" targetRef="gw" />
<bpmn:task id="instr" name="instr">
<bpmn:extensionElements>
<cognitive:instruction content="Follow carefully" />
</bpmn:extensionElements>
<bpmn:incoming>f3</bpmn:incoming>
<bpmn:outgoing>f5</bpmn:outgoing>
</bpmn:task>
<bpmn:sequenceFlow id="f3" sourceRef="gw" targetRef="instr" />
<bpmn:task id="rest" name="rest">
<bpmn:extensionElements>
<cognitive:rest>
<cognitive:configurations>duration: 5</cognitive:configurations>
</cognitive:rest>
</bpmn:extensionElements>
<bpmn:incoming>f5</bpmn:incoming>
<bpmn:outgoing>f6</bpmn:outgoing>
</bpmn:task>
<bpmn:sequenceFlow id="f5" sourceRef="instr" targetRef="rest" />
<bpmn:endEvent id="e" name="e" studyflow:redirectTo="https://app.prolific.com/submissions/complete?cc={COMPLETION_CODE}" studyflow:completionCodeType="static" studyflow:completionCode="ABCD1234">
<bpmn:incoming>f6</bpmn:incoming>
<bpmn:incoming>f4</bpmn:incoming>
</bpmn:endEvent>
<bpmn:sequenceFlow id="f6" sourceRef="rest" targetRef="e" />
<bpmn:sequenceFlow id="f4" sourceRef="gw" targetRef="e" />
</bpmn:process>
</bpmn:definitions>