Skip to main content

dataset

Version: v26.0610
Namespace: https://behaverse.org/schemas/dataset

A metadata schema for cognitive science and neuroscience datasets with mappings to Schema.org, DataCite, BIDS, and other standards

Properties

This schema defines 45 properties for describing dataset metadata.

Core Metadata

Essential fields for dataset identification

PropertyTypeRequirementDescription
@typeschema:DatasetoptionalJSON-LD node type (rdf:type) for schema.org / Google Dataset Search discoverabil...
namestringrequiredShort URL-friendly identifier
pretty_namestringrecommendedHuman-readable display title
descriptionstringrequiredComprehensive dataset description
versionstringrecommendedDataset version (semantic versioning)
licenseenumrequiredLicense identifier in SPDX format
urlstringrecommendedDataset homepage or landing page
doistringrecommendedDigital Object Identifier
keywordsarrayrecommendedKeywords for discovery
languagearrayoptionalISO 639-1 language codes

Dates

Temporal metadata

PropertyTypeRequirementDescription
date_createdstringrecommendedDate dataset was originally created (ISO 8601)
date_publishedstringrecommendedDate first published (ISO 8601)
date_modifiedstringoptionalDate of last modification (ISO 8601)
date_addedstringrequiredDate added to catalog (ISO 8601, internal)
last_verifiedstringoptionalDate metadata was last verified (ISO 8601, internal)

People & Organizations

Creator and contributor information

PropertyTypeRequirementDescription
creatorarrayrecommendedPrimary dataset creators/authors
curatorarrayoptionalDataset curators responsible for maintenance and quality

Creator Object

Each creator in the creator array is an object with the following properties:

PropertyTypeRequirementDescription
namestringrequiredCreator full name
emailstringoptionalCreator email address (format: email)
orcidstringoptionalORCID identifier (format: 0000-0000-0000-0000)
affiliationstringoptionalInstitutional affiliation

Example:

{
"creator": [
{
"name": "Jane Researcher",
"email": "[email protected]",
"orcid": "0000-0001-2345-6789",
"affiliation": "University Psychology Department"
}
]
}

Curator Object

Each curator in the curator array is an object with the following properties:

PropertyTypeRequirementDescription
namestringrequiredCurator full name
emailstringoptionalCurator email address (format: email)
orcidstringoptionalORCID identifier (format: 0000-0000-0000-0000)
affiliationstringoptionalInstitutional affiliation

Citations & References

Published papers and references

PropertyTypeRequirementDescription
citationarrayrecommendedPublished papers or references

Citation Object

Each citation in the citation array is an object with the following properties:

PropertyTypeRequirementDescription
typestringoptionalCitation type
doistringoptionalDOI of cited paper
urlstringoptionalURL of cited resource
textstringoptionalFull citation text
arxiv_idstringoptionalArXiv preprint ID

Population & Sample

Aggregate-level population demographics and coverage

PropertyTypeRequirementDescription
sample_sizeintegerrequiredTotal number of participants
age_rangearrayrecommended[min, max] age in years
age_meannumberoptionalMean participant age
age_stdnumberoptionalStandard deviation of age
sex_distributionobjectrecommendedParticipant counts by sex
age_categoryarrayoptionalAge group classification
population_categoryenumoptionalPopulation type (clinical vs healthy)
inclusion_criteriaarrayoptionalParticipant inclusion criteria
exclusion_criteriaarrayoptionalParticipant exclusion criteria
spatial_coveragestringoptionalGeographic location/region
temporal_coveragestringoptionalTime period data covers

Sex Distribution Object

Each sex_distribution in the sex_distribution array is an object with the following properties:

PropertyTypeRequirementDescription
femaleintegeroptionalNumber of female participants
maleintegeroptionalNumber of male participants
otherintegeroptionalNumber of participants with other gender identity
not_reportedintegeroptionalNumber of participants who did not report gender

Example:

{
"sex_distribution": {
"female": 52,
"male": 45,
"other": 2,
"not_reported": 1
}
}

Data Modalities & Measurement

Types of data collected and measurement techniques

PropertyTypeRequirementDescription
measurement_techniquearrayrecommendedMeasurement techniques used with optional detailed specifications
constructs_measuredarrayoptionalCognitive or psychological constructs measured in the dataset

Measurement Technique Object

Each measurement_technique in the measurement_technique array is an object with the following properties:

PropertyTypeRequirementDescription
typestringoptionalHigh-level category of measurement
techniquestringrequiredSpecific measurement technique
channelsintegeroptionalNumber of channels/electrodes (for EEG, MEG)
sampling_ratenumberoptionalSampling rate in Hz (for EEG, MEG, physiological)
referencestringoptionalReference type (for EEG, MEG)
manufacturerstringoptionalEquipment manufacturer/system
field_strengthnumberoptionalMagnetic field strength in Tesla (for MRI)
trnumberoptionalRepetition time in ms (for fMRI)
tenumberoptionalEcho time in ms (for MRI)
detailsstringoptionalAdditional technique-specific details
response_typearrayoptionalTypes of behavior responses (applies when technique is behavior)
formatstringoptionalFile format for this specific measurement (e.g., edf, bdf, nii, csv)
granularitystringoptionalGranularity of the measurement (e.g., per trial, per subject, per session)

Example:

{
"measurement_technique": [
{
"type": "electrophysiology",
"technique": "EEG",
"channels": 64,
"sampling_rate": 512,
"reference": "average",
"manufacturer": "BioSemi",
"granularity": "event-data"
}
]
}

Activities & Paradigms

Cognitive tasks and experimental paradigms

PropertyTypeRequirementDescription
activityarrayrecommendedActivities or tasks performed by participants with associated measurements

Activity Object

Each activity in the activity array is an object with the following properties:

PropertyTypeRequirementDescription
namestringrequiredActivity/task name
typestringoptionalActivity type/category
measurementsarrayoptionalList of measurement techniques collected during this activity (reference by technique name)
trialsintegeroptionalNumber of trials
durationnumberoptionalTypical duration in minutes
conditionsarrayoptionalExperimental conditions
measuresarrayoptionalPrimary dependent variables
constructsarrayoptionalConstructs measured

Example:

{
"activity": [
{
"name": "N-Back",
"type": "task",
"measurements": [
"EEG",
"eye-tracking",
"response-device"
],
"trials": 150,
"duration": 20,
"conditions": [
"0-back",
"2-back"
],
"measures": [
"d_prime",
"reaction_time"
],
"constructs": [
"working memory"
]
}
]
}

Study Design

Study design and methodology

PropertyTypeRequirementDescription
study_design_typeenumrecommendedStudy design category
intervention_typearrayoptionalType(s) of intervention (applies when study_design_type is intervention)
session_countintegeroptionalNumber of experimental sessions per participant
session_descriptionstringoptionalBrief description of session structure (if multi-session)

Data Files & Distribution

File formats and access information

PropertyTypeRequirementDescription
data_formatsarrayrecommendedFile formats (csv, json, edf, etc.)
data_size_gbnumberoptionalTotal dataset size (GB)
data_structurestringoptionalOrganization of data files
download_urlstringoptionalDirect URL to download dataset files (e.g., .zip archive or direct file download...
access_urlstringoptionalURL to dataset landing page with documentation, access instructions, and metadat...
access_conditionsobjectoptionalAccess requirements and restrictions for the dataset

Access Conditions Object

Each access_conditions in the access_conditions array is an object with the following properties:

PropertyTypeRequirementDescription
is_freebooleanoptionalWhether the dataset is freely accessible without cost
requirementsstringoptionalAccess restrictions or requirements (e.g., registration, data use agreement)

Example:

{
"access_conditions": {
"is_free": true,
"requirements": "Publicly available"
}
}

Ethics

Ethical approval and data quality

PropertyTypeRequirementDescription
ethical_approvalobjectoptionalEthical approval and IRB information

Ethical Approval Object

Each ethical_approval in the ethical_approval array is an object with the following properties:

PropertyTypeRequirementDescription
obtainedbooleanoptionalWhether ethics approval was obtained
institutionstringoptionalIRB or ethics board institution
protocolstringoptionalProtocol or approval number

Example:

{
"ethical_approval": {
"obtained": true,
"institution": "University IRB",
"protocol": "IRB-2024-001"
}
}

HuggingFace-Specific

ML dataset metadata

PropertyTypeRequirementDescription
size_categoryenumoptionalHF size bucket
task_categoriesarrayoptionalML task types

Usage

See the examples for practical usage patterns.

Version History

The current version of dataset is v26.0610.

Older versions are available in the dataset/versions/ directory.