Conventions
Data and code conventions
Style guides and standards are useful in that they promote consistency and quality while facilitating the development of efficient workflows, automation, and specialized software. They can apply to data and code1, but also to other aspects of research, such as documentation, naming conventions, and even the design of experiments.
Some of the design choices for BDM are based on standards and style guides for structuring, formatting, and storing data (which has repercussions on subsequent data analysis and workflows). Here are some of the conventions that BDM follows:
- For dataset names, use
kebab-case
as it may be used in a URL (e.g.,my-dataset
). - For column names, variables, and functions use
lower_snake_case
(just like Python). - For the values of categorical variables (enum), use the same format as column names (
lower_snake_case
). Because those values might end up becoming columns during analysis, e.g., one-hot encoding. - For constants use
ALL_CAPS
(e.g.,PI
,MAX_RADIUS
).
In some cases data may be aggregated from multiple different projects, each following their own styles and it might not always be feasible/practical to harmonize them. When in doubt, consult “PEP 8 — the Style Guide for Python Code” or the “Google Python Style Guide”.