MLOps

Training, evaluation, deployment, and monitoring as a studyflow

Analysis

Machine-learning pipelines span data preparation, model training, validation, and deployment. Documentation usually diverges from implementation; model versions are created without clear tracking; deployment decisions lack transparent criteria.

Studyflow provides a formal specification language for ML operations. Researchers can document decision points, environmental dependencies, and gates that govern lifecycle management – all in one diagram.

This example illustrates an MLOps workflow for prediction models: raw behavioral-data ingestion → feature engineering → model training and cross-validation → performance evaluation → conditional deployment → monitoring → retraining.

Diagram (TODO)

The reference diagram for this example should live at docs/assets/img/examples/mlops-pipeline.svg and the source at docs/assets/img/examples/mlops-pipeline.studyflow. Author it in the modeler using the structure below.

Stages

Ingest – a Script activity reading raw data from a Dataset representing the data lake. The data operation marker is Transform.
Feature engineering – a sequence of Map and Reduce operations on the raw data, producing a feature table with a schema. See Attach a schema.
Train/test split – a Filter operation produces two tables.
Model training – a Script activity that consumes the training table and produces a model artifact (a Snapshot of the model Dataset).
Cross-validation gateway – an Exclusive Gateway checking that CV scores meet a threshold. Failing branches loop back to feature engineering with logged failure reasons; passing branches continue.
Fairness audit – a Manual activity (or a Script with reviewer sign-off) checking group-level performance. Output: a fairness report.
Fairness gateway – another exclusive gateway. Pass: continue to deployment. Fail: route to a retraining loop with the fairness criteria as part of the loss.
Deployment – a Script activity that writes the model snapshot to the production store.
Monitoring – a sub-process triggered by a timer event (daily/weekly) that checks live performance against the held-out baseline.
Retraining trigger – a boundary error event on the monitoring sub-process. When live performance drops below threshold, it routes back to feature engineering.

Why diagram it

ML pipelines collect decision-and-rollback paths that are hard to describe in prose:

Gates are visible. The CV and fairness gateways show, in the diagram, what conditions a model must meet before it’s deployed. Reviewers can verify the policy without reading code.
The retraining loop is the topology. Monitoring → boundary event → back to feature engineering is one path on the diagram, not a description split across runbooks.
Model snapshots are first-class. A Snapshot element documents what version of the model was deployed when.
The fairness audit has a name and a place. Treating it as an explicit activity in the diagram makes it harder to skip and easier to demand.

Stages

Why diagram it

See also