Analysis pipelines
End-to-end analysis from raw trial data to summary statistics
Scientific research requires systematic and reproducible data-analysis workflows. Studyflow expresses an analysis pipeline as a process with explicit data flow, transformation semantics, and decision points – giving you a specification a reviewer can read without touching the code.
This example walks through a typical behavioral analysis pipeline: raw trial data → cleaning → derived measures → group statistics → manuscript-ready figures.
TODO: The reference diagram for this example should live at docs/assets/img/examples/analysis-pipeline.svg and the source at docs/assets/img/examples/analysis-pipeline.studyflow. Author it in the modeler using the structure below.
Stages
- Read raw trial data. A task with a data input from a
Datasetrepresenting the raw collection output. - Filter invalid trials. Drop responses faster than 100 ms, trials with missing data, and any flagged-as-bad rows. Uses the
Filterdata operation. See Preprocessing pipelines. - Map to derived measures. Compute log-RT and code conditions. The
Mapdata operation makes this transformation explicit. - Conditional branching. An Exclusive Gateway splits paths based on data type – within-subject designs go down one branch, between-subject down another.
- Group by participant and condition.
- Reduce to group-level summary statistics: mean accuracy, mean log-RT, standard error.
- Manual review activity. A
Manualtask with a data input/output represents the analyst inspecting outliers and re-running with adjusted criteria if needed. - Write outputs. Summary statistics flow into a
Tablewith a CSVW schema; the schema declares the columns so downstream figure-generation tasks can validate. - End event.
Why diagram it
Three reasons a diagram beats text-only documentations:
- Pre-registration is literally the diagram. No drift between what you said you’d do and what the code does.
- The conditional branch is visible. A reader can see immediately that within-subject and between-subject data take different paths – a fact often buried in prose.
- The schema attached to the output table is part of the spec, so the figure-generation code can rely on it without ambiguity.
See also
- Preprocessing pipelines: how to use
Map,Filter, andReduce. - Attach a schema: schema-attached datasets.
- Data: data elements and operations reference.