What a Good Robotics Dataset Card Looks Like
A dataset card is the README for your data: scope, provenance, splits and limitations. Here is what a trustworthy robotics dataset card should contain.
TL;DR. A dataset card is the documentation that ships with a dataset: what it contains, how it was collected, how it is split, and where it falls short. For robotics it should cover sensors and calibration, annotation method, consent and provenance, and honest limitations. Good cards follow the "Datasheets for Datasets" idea and make data safe to reuse.
Why dataset cards exist
The practice was formalised by Datasheets for Datasets (Gebru et al.) and is now standard on the Hugging Face Hub. A card lets a buyer or auditor understand a dataset without reverse-engineering it, and is increasingly expected under data-governance rules.
What a robotics dataset card should contain
- Scope. Tasks, skills, number of episodes and usable hours.
- Collection method. Egocentric vs teleoperation, rig and sensors, control frequency.
- Calibration. Camera intrinsics/extrinsics; coordinate conventions.
- Annotations. What was labelled, by whom, and inter-annotator agreement.
- Splits. Train/val/test and how they were chosen (avoid leakage across environments).
- Consent and provenance. That contributors consented and were paid; a provenance log.
- Limitations. Known biases, gaps, and failure modes - stated honestly.
- Licence and contact.
The honesty test
A card that lists only strengths is a red flag. The most useful section is "limitations" - it tells you where the data will and will not help. For physical AI, note environment/object coverage, because that drives generalisation.
How nxted documents data
Every nxted Capture delivery includes a dataset card plus a data-provenance log and a QA report, bundled in the Data Trust Pack. See also annotating egocentric data.
FAQ
What is a dataset card? The documentation shipped with a dataset describing its scope, collection method, splits, provenance and limitations - the "README" that makes data safe to reuse.
What should a robotics dataset card include? Scope, sensors and calibration, annotation method with inter-annotator agreement, splits, consent and provenance, honest limitations, and licence/contact.
Why do dataset cards matter for compliance? They evidence provenance and data governance, which buyers and frameworks like the EU AI Act increasingly expect for training data.
Every nxted dataset ships with a card and provenance log: see the Data Trust Pack or request a Test Kit.
Physical-AI data specialists at OFORO LTD (UK). We write about egocentric data, robotics dataset formats, RLHF and data governance. See what we build.