Data Collection for Embodied Learning and Smart Behavior

Embodied Learning
1X Technologies partnering with OpenAI to increase the efforts of building its upcoming bipedal android model NEO.
Team Member
Title
Hometown
Languages
Where
When

1-minute summary:

  • This blog post discusses how 1X collects data for training our androids
  • We believe that data quality > data quantity > algorithms
  • Our data collection team can fine-tune models themselves to customize behaviors

1X’s mission is to create androids that work alongside people, and use them to meet the world's labor demands for an abundant society. 

Traditional robotics and industrial automation solutions already make society very productive. Wonders of automation can convert 10 tons of potatoes into potato chips in a few minutes, assemble about a billion smartphones per year, and manufacture a car from scratch every 60 seconds. In such factories, the repetitive task is done in high volumes so this warrants the effort to build a custom machine.

Learning to do chores in ever-changing environments

On the other hand, there is a vast “long tail” of chores for helping humans in human spaces: keeping the office safe and secure, carrying groceries from the car, sorting trash for recycling, tidying and cleaning indoor environments, removing debris and litter from public areas. Existing robot products have yet to make a big dent on these tasks. To tackle these, we have to create a general-purpose robot with the same physical affordances as a human (i.e. an Android), and they have to be smart enough to do everyday chores in the office or the home.

Home and office environments are challenging because they are unstructured and always changing from human use. For example, at one of our customer sites where we have deployed our patrolling solution, there is such frequent construction activity that the location of obstacles and barriers are changing from day-to-day. Because of this, our patrol solution cannot assume that the locations and appearances of obstacles remain constant. Generally, automation becomes challenging when the software developer can’t assume much about the state of the world outside of the robot’s body. You cannot assume where objects are placed relative to the robot’s grippers, or whether the desk in a map has been moved, or whether the coffee tin has enough coffee left in it to brew a cup.

Defining Behavior Through Data

Our own approach to autonomy is inspired by how digital assistants like ChatGPT and autonomous vehicles (Waymo) are developed. The strategy is to collect a large variety of environmental situations the droid encounters in data, and learn a general understanding of the task from that data, rather than hand-engineering the code to perform a single repetitive motion. By collecting large amounts of diverse experience, our droids generalize to new situations they have not seen before. Initially they don’t know what concepts like “grasping” or “sorting” or “patrolling” mean, but when provided with enough examples of these tasks across a wide variety of scenarios, they develop a general understanding of what to do in new environments.

Because our androids’ understanding of the world is derived from data, how we collect and curate our training data becomes a critical part of our strategy. Here’s what matters the most to us when collecting data:

Embodied teleoperation in VR tells us how hard it is to learn tasks:

If a human can look through the android’s eyes and control it to perform the task using VR teleoperation, then in principle it should be possible to replicate the human decisions with a neural network to perform the same task with the same inputs. When exploring a new task we want to teach the robot, we first verify that it is feasible in VR. This is the existence proof that there exists at least one neural network (the human brain) that can perform the task with the information available to the droid’s sensors.

Our VR data collection system also gives us an intuitive guess of the difficulty of learning the task. All other things being equal, predicting the actions for a 2 second demonstration of opening a door is much easier than predicting the actions for a 20 second demonstration of opening the same door. Machine learning methods for robotics tend to have an easier time predicting a short sequence of clean actions than a long sequence of noisy actions. Any extra unnecessary time spent performing the demo effectively becomes “noisy data” that adds to the difficulty of training our models. To train as efficiently as possible, we care deeply about making the most intuitive, low-latency teleoperation interface possible.

Optimizing the data collection tools to be easy to use directly translates to cleaner, shorter data and more capable androids.

Investing in high quality data to train good models 

It is an open secret in the applied ML community that when it comes to training performant ML systems, whether it be physical robots or digital assistants, the careful curation of training data is often far more impactful than developing new learning algorithms. By selectively gathering labels for scenarios where the model fails, and then re-training the model on that new data, we can fix the failure modes without changing the underlying algorithm. 

Our engineers who train our ML models spend significant amounts of time practicing tasks in VR and reviewing the data to ensure that the way we gather and process data is as time-efficient as possible. 

We also employ a team of Android Operators to scale up data collection to more diverse environments. If having a detailed understanding of the data being collected allows a ML researcher to more effectively train a good model, then the converse is also true: the person responsible for collecting data for training models can become more effective if they train some models themselves. They can build a detailed intuition for how much behavior change they can expect from the model as they vary the quantity and quality of data they collect. 

Open source GUI wrappers around stable diffusion have allowed non-ML experts to fine-tune the base stable diffusion models to add on new styles and improvements. Inspired by this trend, we’ve built similar tools that allow our android operations team to fine-tune behaviors. 

The AI team trains the base model, which has a general visual understanding of the world. The Android Operations team designs new tasks, collects data, trains and deploys the models, and collects more data in situations where the model struggles to generalize. Here are a few of the behaviors that our Android Operations team have taught the robot on their own:

As robotic capabilities become more and more data-driven over time and less dependent on specialist knowledge,robotics will become more accessible to non-technical users.

On November 4th 2023 1X is hosting our first AI event, 1X Discover Day: Embodied Learning opening up limited invitations only. Click to learn more.

Back to Top      ↑
Be the first to know the latest news and updates from 1X.

Data Collection for Embodied Learning and Smart Behavior

Embodied Learning
1X Technologies partnering with OpenAI to increase the efforts of building its upcoming bipedal android model NEO.
Team Member
Title
Hometown
Languages
Where
When

1-minute summary:

  • This blog post discusses how 1X collects data for training our androids
  • We believe that data quality > data quantity > algorithms
  • Our data collection team can fine-tune models themselves to customize behaviors

1X’s mission is to create androids that work alongside people, and use them to meet the world's labor demands for an abundant society. 

Traditional robotics and industrial automation solutions already make society very productive. Wonders of automation can convert 10 tons of potatoes into potato chips in a few minutes, assemble about a billion smartphones per year, and manufacture a car from scratch every 60 seconds. In such factories, the repetitive task is done in high volumes so this warrants the effort to build a custom machine.

Learning to do chores in ever-changing environments

On the other hand, there is a vast “long tail” of chores for helping humans in human spaces: keeping the office safe and secure, carrying groceries from the car, sorting trash for recycling, tidying and cleaning indoor environments, removing debris and litter from public areas. Existing robot products have yet to make a big dent on these tasks. To tackle these, we have to create a general-purpose robot with the same physical affordances as a human (i.e. an Android), and they have to be smart enough to do everyday chores in the office or the home.

Home and office environments are challenging because they are unstructured and always changing from human use. For example, at one of our customer sites where we have deployed our patrolling solution, there is such frequent construction activity that the location of obstacles and barriers are changing from day-to-day. Because of this, our patrol solution cannot assume that the locations and appearances of obstacles remain constant. Generally, automation becomes challenging when the software developer can’t assume much about the state of the world outside of the robot’s body. You cannot assume where objects are placed relative to the robot’s grippers, or whether the desk in a map has been moved, or whether the coffee tin has enough coffee left in it to brew a cup.

Defining Behavior Through Data

Our own approach to autonomy is inspired by how digital assistants like ChatGPT and autonomous vehicles (Waymo) are developed. The strategy is to collect a large variety of environmental situations the droid encounters in data, and learn a general understanding of the task from that data, rather than hand-engineering the code to perform a single repetitive motion. By collecting large amounts of diverse experience, our droids generalize to new situations they have not seen before. Initially they don’t know what concepts like “grasping” or “sorting” or “patrolling” mean, but when provided with enough examples of these tasks across a wide variety of scenarios, they develop a general understanding of what to do in new environments.

Because our androids’ understanding of the world is derived from data, how we collect and curate our training data becomes a critical part of our strategy. Here’s what matters the most to us when collecting data:

Embodied teleoperation in VR tells us how hard it is to learn tasks:

If a human can look through the android’s eyes and control it to perform the task using VR teleoperation, then in principle it should be possible to replicate the human decisions with a neural network to perform the same task with the same inputs. When exploring a new task we want to teach the robot, we first verify that it is feasible in VR. This is the existence proof that there exists at least one neural network (the human brain) that can perform the task with the information available to the droid’s sensors.

Our VR data collection system also gives us an intuitive guess of the difficulty of learning the task. All other things being equal, predicting the actions for a 2 second demonstration of opening a door is much easier than predicting the actions for a 20 second demonstration of opening the same door. Machine learning methods for robotics tend to have an easier time predicting a short sequence of clean actions than a long sequence of noisy actions. Any extra unnecessary time spent performing the demo effectively becomes “noisy data” that adds to the difficulty of training our models. To train as efficiently as possible, we care deeply about making the most intuitive, low-latency teleoperation interface possible.

Optimizing the data collection tools to be easy to use directly translates to cleaner, shorter data and more capable androids.

Investing in high quality data to train good models 

It is an open secret in the applied ML community that when it comes to training performant ML systems, whether it be physical robots or digital assistants, the careful curation of training data is often far more impactful than developing new learning algorithms. By selectively gathering labels for scenarios where the model fails, and then re-training the model on that new data, we can fix the failure modes without changing the underlying algorithm. 

Our engineers who train our ML models spend significant amounts of time practicing tasks in VR and reviewing the data to ensure that the way we gather and process data is as time-efficient as possible. 

We also employ a team of Android Operators to scale up data collection to more diverse environments. If having a detailed understanding of the data being collected allows a ML researcher to more effectively train a good model, then the converse is also true: the person responsible for collecting data for training models can become more effective if they train some models themselves. They can build a detailed intuition for how much behavior change they can expect from the model as they vary the quantity and quality of data they collect. 

Open source GUI wrappers around stable diffusion have allowed non-ML experts to fine-tune the base stable diffusion models to add on new styles and improvements. Inspired by this trend, we’ve built similar tools that allow our android operations team to fine-tune behaviors. 

The AI team trains the base model, which has a general visual understanding of the world. The Android Operations team designs new tasks, collects data, trains and deploys the models, and collects more data in situations where the model struggles to generalize. Here are a few of the behaviors that our Android Operations team have taught the robot on their own:

As robotic capabilities become more and more data-driven over time and less dependent on specialist knowledge,robotics will become more accessible to non-technical users.

On November 4th 2023 1X is hosting our first AI event, 1X Discover Day: Embodied Learning opening up limited invitations only. Click to learn more.

Back to Top      ↑