The scaling laws that govern robot learning
A peer-reviewed result from 2025 changed how we think about robot data: diversity beats raw volume, and generalisation follows a power law.
Language models have scaling laws. So, it turns out, does robot learning.
The power law
The ICLR 2025 paper "Data Scaling Laws in Imitation Learning" collected more than 40,000 demonstrations and ran over 15,000 real-world rollouts. The headline finding: a policy's ability to generalise follows a roughly power-law relationship with the number of distinct environments and objects it has seen.
The surprising part is what does NOT drive generalisation. Adding more demonstrations of the same task in the same kitchen helps far less than adding the same task across many kitchens. Diversity of scene and object beats sheer volume.
Why this matters for sourcing
If diversity is the dominant variable, then the cheapest path to a capable policy is broad coverage: many workers, many workshops, many tools, many lighting conditions. That is exactly the structure a country-scale skilled workforce provides, and exactly what Western, lab-bound datasets lack.
The practical takeaway
- Optimise your capture budget for breadth, not repetition.
- Track environment and object counts as first-class metrics, not just hours.
- A diverse 200-hour dataset can outperform a narrow 1,000-hour one.
This is the research foundation under the Nxted Capture model.
Physical-AI data specialists at OFORO LTD (UK). We write about egocentric data, robotics dataset formats, RLHF and data governance. See what we build.