Conventions

Introduction

Style guides, standards, and conventions are useful in that they support consistency and quality while facilitating the development of efficient workflows, automation, and specialized software. They can apply to:

  1. data
  2. code syntax (surface-level properties)
  3. code semantic (structural, architectural, paradigmatic properties)
  4. workflow and best practices (everything that is not directly code)

There are various great books that cover codes and best practices (e.g., Clean Code: A Handbook of Agile Software Craftsmanship) but comparatively less about data, in particular within the context of behavioral sciences.

Here, we focus on standards and principles for structuring, formatting, and storing data (which has repercussions on subsequent data analysis and workflows). It aims to list and sometimes explain the particular choices we have made.

We do not claim that the conventions and rules listed here are the absolute best way of doing things. As many have argued before us, consistency is the most important objective; in aiming to achieve such consistency we made opinionated choices people may disagree with. This being said, whenever relevant and possible we aim to follow established standards and guidelines (e.g., schema.org, BIDS, PEP8.)

BDM conventions

  • For dataset names in code and classes use kebab-case (e.g., my-dataset as it may be used in a URL).
  • For column names, variables, and functions use lower_snake_case.
  • For categorical values (i.e., values of an enum), also use lower_snake_case (because those values might end up becoming columns during the data analysis).
  • For constants use ALL_CAPS (e.g., PI, MAX_RADIUS).

In some cases data may be aggregated from multiple different projects, each following their own styles and it might not always be feasible/practical to harmonize them. When in doubt, consult PEP 8 — the Style Guide for Python Code.

Other standards

What other standards are we following?

  • BIDS
  • ISOXXX
  • schema.org