Palo Alto, California
Research Engineer, Scaling
Target start date: Immediately. Relocation provided.
Since its founding in 2015, 1X has been at the forefront of developing advanced humanoid robots designed for household use. Our mission is to create an abundant supply of labor via safe, intelligent humanoids. At 1X, you'll own critical projects, tackle unsolved research problems, deliver great products to customers, and be rewarded based on merit and achievement.
As a Research Engineer, Scaling, you'll build the systems that let every team and every robot go faster: training more often, evaluating more reliably, and deploying better models to our growing fleet. You'll transform prototypes into production-scale infrastructure for learning and inference, enabling larger training runs and maximizing edge compute utilization to make our models more capable.
Tech Stack
Linux
Python / C++
PyTorch / TorchTitan / TensorRT
Triton / CUDA
Location
The role is based in Palo Alto, CA. Candidates are expected to be in-person at the office.
Responsibilities
High agency and ownership on scaling capabilities in distributed training and/or inference
Ensure that compute is never the bottleneck, i.e. we always have more compute available than data
Enable large-scale (1000+ GPU) training on billion frames+ of robot data, from fault tolerance to distributed ops to experiment management
Optimize high-throughput datacenter scale distributed inference for world models: work on the world's fastest diffusion inference engine
Improve low-latency on-device inference for a variety of robot policies with quantization, scheduling, distillation and more