Analysis pipelines

End-to-end analysis from raw trial data to summary statistics

Analysis

Scientific research requires systematic and reproducible data-analysis workflows. Studyflow expresses an analysis pipeline as a process with explicit data flow, transformation semantics, and decision points – giving you a specification a reviewer can read without touching the code.

This example walks through a typical behavioral analysis pipeline: raw trial data → cleaning → derived measures → group statistics → manuscript-ready figures.

NoteDiagram (TODO)

TODO: The reference diagram for this example should live at docs/assets/img/examples/analysis-pipeline.svg and the source at docs/assets/img/examples/analysis-pipeline.studyflow. Author it in the modeler using the structure below.

Stages

  1. Read raw trial data. A task with a data input from a Dataset representing the raw collection output.
  2. Filter invalid trials. Drop responses faster than 100 ms, trials with missing data, and any flagged-as-bad rows. Uses the Filter data operation. See Preprocessing pipelines.
  3. Map to derived measures. Compute log-RT and code conditions. The Map data operation makes this transformation explicit.
  4. Conditional branching. An Exclusive Gateway splits paths based on data type – within-subject designs go down one branch, between-subject down another.
  5. Group by participant and condition.
  6. Reduce to group-level summary statistics: mean accuracy, mean log-RT, standard error.
  7. Manual review activity. A Manual task with a data input/output represents the analyst inspecting outliers and re-running with adjusted criteria if needed.
  8. Write outputs. Summary statistics flow into a Table with a CSVW schema; the schema declares the columns so downstream figure-generation tasks can validate.
  9. End event.

Why diagram it

Three reasons a diagram beats text-only documentations:

  • Pre-registration is literally the diagram. No drift between what you said you’d do and what the code does.
  • The conditional branch is visible. A reader can see immediately that within-subject and between-subject data take different paths – a fact often buried in prose.
  • The schema attached to the output table is part of the spec, so the figure-generation code can rely on it without ambiguity.

See also