NEO Looking to sky

Redwood AI

JUN 10 '25Eric Jang

We are excited to introduce Redwood, 1X’s breakthrough AI model that we will be deploying to homes. Redwood is a vision-language transformer tailored for the humanoid form factor and capable of performing end-to-end mobile manipulation tasks like retrieving objects for users, opening doors, and navigating around the home. Redwood empowers NEO Gamma to learn from real-world experience, on top of hardware designed for compliance, safety, and resilience.

Feature Overview

  • Generalization: Handles variation in tasks—like picking up never-before-seen objects in unfamiliar locations. Trained on a large dataset of teleoperated and autonomous episodes from EVE and NEO, Redwood exhibits emergent behaviors such as choosing
  • Whole body & multi-contact manipulation: Redwood is among the first VLAs to control locomotion jointly with manipulation, enabling bracing and leaning behaviors during manipulation.
  • Mobile bi-manual manipulation: Allows NEO to position itself precisely for tasks, perform actions that require movement across space, and manipulate objects while on the move.
  • Runs onboard: Redwood is compute-efficient and runs fully on NEO’s onboard embedded GPU.

Cross-Embodiment Architecture

In order to power autonomy on both EVE and NEO platforms, Redwood fuses pre-trained language embeddings, vision tokens from a pre-trained vision transformer, and proprioception embeddings from a sequence of joint positions and joint applied forces. These are passed through several more transformer blocks, which extract a latent representation vector. We decode this representation into EVE or NEO actions using a diffusion policy.

Redwood model



Generalizing to manipulating new objects in locations not seen in the training data is crucial for the model to work in home environments, where the home is never in the exact same configuration twice. This is achieved by training on a diverse dataset gathered on NEOs in 1X offices as well as employee homes.

To further improve generalization to new scenarios despite its small size (160M parameters), Redwood is trained not only to predict actions, but a variety of “cognitive” prediction targets like estimating the current location of NEO’s hands and relevant objects in image space. These cognitive tasks help ground NEO’s visual representations and allow it to generalize better to unseen environments, despite having a small model size. Below, we show a continuous take of Redwood being able to grasp unseen bottles from a variety of locations.

video thumbnail

Whole Body Control and Multi-contact manipulation

Manipulation and locomotion behaviors are typically decoupled in most robotic systems. However, manipulation in the home necessitates going beyond picking small objects on counter-tops and tables: humans use their legs, hips and spine to bend down to pick toys and clothes off the ground, and lean into heavy doors when pushing them open. These “whole body control” tasks make it impossible to cleanly separate locomotion and manipulation.

To enable similar capabilities on NEO, Redwood predicts not only the arm and hand commands, but also walking, manipulation, and pelvis pose commands simultaneously. This greatly expands the kinematic reach and payload capacity NEO can work with.

video thumbnail



Coordinating all parts of the body to engage with the environment also enables multi-contact manipulation, such as bracing a hand against a wall when pulling open heavy doors.

video thumbnail

Mobile Manipulation

Solving chores requires combining manipulation with navigating across the home. In real home tasks, the objects of interest are rarely all in front of the robot at the start. Furthermore, navigating to and getting close enough to an object to grasp it needs to take into account the way the model will choose to pick up the object. If one does not train navigation skills jointly with those for manipulation, then a separate navigation stack may fail to position the robot in an optimal position to grasp the target object. Vice versa, if manipulation behavior does not take into account navigation behavior that might follow it, this could lead to unwanted collisions or carrying the object across the room in an unsafe way.

To that end, Redwood is trained on a large diverse set of object navigation and pick-and-place demonstrations within the home and is trained to plan navigation and manipulation behaviors jointly. An emergent property of training from these demonstrations is that Redwood can automatically decide to use the left, right, or both hands to pick up an object.

Runs Fully On-board

Running Redwood onboard allows NEO to be deployed in more diverse environments: in basements, in the garden, in homes with spotty Internet infrastructure, in wilderness campsites.

To that end, Redwood is a 160M parameter transformer model that runs on NEO’s onboard GPU at around 5hz. To pack as much intelligence as possible into a relatively small number of parameters, we’ve found that the additional cognitive losses help with grounding the representations, especially in unseen environments.

Voice Control

Voice control is an intuitive interface to interact with general-purpose robots in the home. Using an offboard speech-to-speech LLM, we extract the goal the user intends to command NEO with from a conversational context, and then convert the command into a vector offboard using a sentence encoder. This vector is then passed as an input into the Redwood model, which is trained on thousands of such text embeddings.

Learns from Success and Failure Data

Large-scale behavior cloning methods typically only imitate successful demonstrations. Redwood is trained to learn from both successful and failure rollouts, allowing NEO to improve from any interaction it has with the world regardless of success. The failure rollouts provide supervision signals on the cognitive prediction heads, which helps prevent overfitting to a relatively narrow distribution of states seen during successful demonstrations. The successful demonstrations supervise both the action diffusion heads and the cognitive predictions.

Want to Scale Redwood? Join us!

We think that general purpose autonomous humanoids, with their intelligence incubated in the home, will be a generation-defining technology that reshapes quality of life for the elderly, for busy parents, and for use cases that scarcely cannot be imagined today. We’re looking for driven, high-agency engineers to help us scale up Redwood to the next level, and to deploy a production-grade AI in as many homes as possible this year. If working on Redwood excites you, we have a large number of open roles in Palo Alto:

Research Engineer, Autonomy
Software Engineer, AI Tooling
Software Engineer, Teleoperation
Research Engineer, Robot Character
Research Engineer, Data Infrastructure
Research Engineer, World Models
Research Engineer, Reinforcement Learning
AI Residency

Discover

ExperienceNEO

Opt in to receive updates. Unsubscribe anytime.