Glossary

Physical AI & robotics data, defined

Short, plain-English definitions of the terms that come up when buying or building physical-AI training data.

Egocentric data

Egocentric data is video recorded from the first-person point of view of the person performing a task - what a robot’s own camera would see - usually with depth, hand pose and a 6-DoF trajectory. It is the scarce ingredient for teaching robots manipulation, because there is no web-scale corpus of physical actions.

Physical AI

Physical AI is artificial intelligence that perceives and acts in the physical world - humanoid robots, manipulators and embodied agents - rather than only producing text or images. Its main bottleneck is data: first-person demonstrations of real physical tasks, which must be recorded rather than scraped.

Vision-language-action (VLA) model

A vision-language-action (VLA) model takes camera images and a natural-language instruction and outputs robot actions, extending vision-language models from text output to physical control. VLAs are trained on large collections of demonstration episodes, so their performance depends heavily on data volume, diversity and annotation.

RLHF (reinforcement learning from human feedback)

RLHF improves an AI model by training it against human judgements of its outputs: humans rank or rate responses, a reward model learns those preferences, and the model is optimised toward them. The quality of the result depends heavily on who provides the feedback and how it is measured.

Embodied AI

Embodied AI is AI that learns and acts through a physical or simulated body that senses and moves in an environment - robots and embodied agents - as opposed to disembodied models that only process text or images. It needs interaction data, not just web data.

Teleoperation (robot data collection)

Teleoperation is when a human drives a robot to perform a task while the robot’s own actions are recorded, producing action-aligned training data on the exact platform you will deploy. It is precise but slow and expensive to scale, so it is often blended with cheaper human egocentric video.

RLDS

RLDS (Reinforcement Learning Datasets) is a TensorFlow-Datasets format for episodic robot and RL data. It is the format behind the cross-embodiment Open X-Embodiment / RT-X collection, making it a natural choice when training on or alongside that data.

LeRobot

LeRobot is Hugging Face’s open robotics library and dataset format, storing episodes as Parquet (state and action) plus MP4 video and JSON metadata, with first-class tooling and easy sharing on the Hugging Face Hub.

HDF5 (robotics)

HDF5 is a general, self-describing scientific container from the HDF Group, widely used in robotics by the ALOHA and robomimic conventions. It stores trajectories as nested groups and arrays, with mature libraries in many languages.

Data provenance (robotics datasets)

Data provenance is the documented record of where each piece of training data came from - who produced it, with what consent, and how it was processed. For robotics datasets it includes a provenance log tracing every clip to its source, plus a dataset card describing scope and limitations.

DPDP Act (India)

The Digital Personal Data Protection Act, 2023 is India’s data-protection law, setting obligations around consent, purpose limitation and data-principal rights. For India-sourced AI training data that contains personal information, capture must be consented and compliant with the DPDP Act.

EU AI Act

The EU AI Act (Regulation (EU) 2024/1689) is the European Union’s law for artificial intelligence. It places data-governance duties on providers of high-risk AI systems - notably Article 10 (data quality and provenance) and Annex IV (technical documentation) - which your training-data vendor is part of.