How to Record a LeRobot-Compatible Dataset
LeRobot is Hugging Face's open-source robot-learning stack. Its dataset format has become the default for arm-scale imitation learning: episode-based, parquet + video shards, self-describing metadata, HuggingFace Hub native. This tutorial walks through recording a clean 50-episode dataset from scratch using the lerobot record CLI and publishing it.
What you will accomplish
At the end of this tutorial you will have: a 50-episode LeRobot dataset on disk, a visual inspection pass over it, normalization statistics, and a public or private dataset published to the HuggingFace Hub. That dataset is immediately compatible with LeRobot policy training (ACT, Diffusion Policy, VLA heads), reference implementations, and community evaluation.
The LeRobot dataset format stores each episode as a row range in a parquet file with joint states, actions, and references to video frames stored in separate MP4 shards. The format is designed to stream efficiently from the Hub, load cleanly into PyTorch DataLoaders, and be self-describing via a JSON metadata file.
Prerequisites
- A LeRobot-supported robot arm. SO-100 / SO-101, OpenArm, Koch v1.1, Moss, Aloha, WidowX, UR series — the supported list grows regularly. Browse the robot store for options.
- At least one camera. USB webcam or Intel RealSense — wrist + top-down is ideal.
- Ubuntu 22.04 workstation with Python 3.10+ and ffmpeg.
- HuggingFace account for publishing (optional).
- If you need a teleop rig too, see the ALOHA teleop tutorial for bimanual or a leader-follower guide on the academy for single-arm.
The steps
-
Install LeRobot
Install LeRobot from pip. Optional extras exist per hardware type:
conda create -n lerobot python=3.10 -y conda activate lerobot pip install --upgrade pip pip install lerobot # hardware-specific extras (pick yours) pip install 'lerobot[feetech]' # SO-100, Koch pip install 'lerobot[dynamixel]' # Aloha, WidowX pip install 'lerobot[intelrealsense]' # for RealSense camerasVerify the install:
python -c "import lerobot; print(lerobot.__version__)". Check upstream at github.com/huggingface/lerobot for the current supported hardware matrix — the project moves quickly. -
Calibrate your arm
Each supported arm has a calibration sub-command. General pattern:
lerobot-calibrate \ --robot.type=so100 \ --robot.port=/dev/ttyACM0This discovers motor IDs, sets joint zero positions, and stores the calibration to
~/.cache/huggingface/lerobot/calibration/. Re-run if you ever remount the arm or swap cables. -
Verify cameras and teleop
Run a teleop session with a live preview to sanity-check everything:
lerobot-teleoperate \ --robot.type=so100 \ --robot.port=/dev/ttyACM0 \ --robot.cameras='{"top": {"type":"opencv","index":0}, "wrist": {"type":"opencv","index":2}}' \ --display_data=trueConfirm both cameras stream at 30 FPS without tearing, arm tracks cleanly, and the preview window updates. If the arm jitters, check USB cable quality first — LeRobot arms are serial-bus sensitive to flaky cables.
-
Define the task and instruction
Pick one well-defined task for your first dataset. "Pick the red cube and place it in the blue bowl" beats "do stuff with the cubes" every time. Your natural-language instruction will be saved verbatim on every episode and will be the only text conditioning a downstream policy sees.
Tip: keep task success criteria binary and observable from the data. "Cube is in the bowl" is crisp. "Cube is placed gently" is not. -
Record 50 episodes
Now the main event. Record 50 episodes with the
lerobot recordCLI. The flag names move slightly between releases — the general invocation pattern is:lerobot-record \ --robot.type=so100 \ --robot.port=/dev/ttyACM0 \ --robot.cameras='{"top": {"type":"opencv","index":0}, "wrist": {"type":"opencv","index":2}}' \ --dataset.repo_id=<your-username>/red_cube_blue_bowl \ --dataset.num_episodes=50 \ --dataset.single_task="Pick the red cube and place it in the blue bowl." \ --dataset.fps=30 \ --dataset.episode_time_s=20Between episodes, reset the scene to a slightly different initial configuration — vary cube position, bowl position, lighting angle. This is the single biggest factor in how well a trained policy generalizes. Take breaks. Operator fatigue after ~30 episodes is real; the quality of your last 20 demos matters more than you think.
-
Inspect the dataset
Scrub through episodes with the built-in visualizer:
lerobot-dataset-visualize --repo-id=<your-username>/red_cube_blue_bowlWatch for: camera dropouts, joint discontinuities, episodes where you missed the task, frame-rate variability. Common sanity check: plot action norm over time — spikes usually mean teleop artefacts.
-
Drop bad episodes and compute stats
Remove failed or messy episodes from the dataset metadata. LeRobot supports filtering by episode index. After cleanup, recompute normalization stats (mean / std per action and state dimension) so downstream training uses correct values. The
lerobotCLI includes acompute-statscommand for this. -
Push to HuggingFace Hub
Authenticate once with
huggingface-cli login, then push. Thelerobot recordCLI supports automatic upload; you can also push after the fact:huggingface-cli upload <your-username>/red_cube_blue_bowl \ ~/.cache/huggingface/lerobot/<your-username>/red_cube_blue_bowl \ --repo-type=datasetThe dataset is now public (or private). You can load it anywhere with
LeRobotDataset.from_pretrained(...). Train a baseline ACT policy with one command to validate — if ACT can fit 50 demos in under an hour, your dataset is healthy.
What to do next
Once you have a clean LeRobot dataset, two high-value follow-ups: (1) train a policy on it — ACT is the easiest first baseline, and (2) fine-tune OpenVLA on the dataset to compare against the ACT baseline. If you are scaling up to serious data collection, the ALOHA bimanual teleop is the next step up in hardware; for humanoids, start with Unitree G1 camera calibration.
Common failure modes
Episodes have different lengths: expected — LeRobot pads during training. Keep recording at a fixed target duration, then let the framework handle length variance.
Huge dataset size: LeRobot video shards are H.264-encoded. If size is still a problem, drop to 15 FPS or 480p for cameras that do not need more resolution.
Policy overfits instantly: not enough scene variation during collection. Randomize initial conditions more aggressively.
Hub upload fails: usually repo-exists or permission errors. Create the repo first with huggingface-cli repo create.
Deep dive: the LeRobot dataset format
LeRobot dataset layout on disk: a top-level directory containing meta/info.json, meta/stats.json, meta/episodes.jsonl, a data/ folder with parquet shards (usually one per episode), and a videos/ folder with MP4 shards. The info.json declares dataset features — joint state shape, action shape, camera names, FPS — and serves as the authoritative schema for loaders. The parquet rows are the per-timestep scalars and references; the MP4s hold the image observations, referenced by frame index.
This design is deliberate: scalars in parquet gives fast columnar access for batched dataloading, and MP4s give order-of-magnitude better on-disk compression than per-frame image files without sacrificing random access. H.264 encoded at CRF 18 is nearly lossless for manipulation; CRF 23 gives you roughly 3x smaller files with no visible quality impact on policy training for most tasks.
Deep dive: scene variation is the single biggest quality lever
If you have 2 hours for recording, spend the first 30 minutes planning scene variation, not the first 90 minutes recording baseline episodes. What to vary:
- Object pose. At least 10 starting positions in a grid across the workspace.
- Lighting. Overhead fluorescent vs warm lamp vs natural window — three conditions minimum.
- Distractors. Add 2-3 irrelevant objects on the table half the time. Policies trained without distractors break the moment your engineering desk has a coffee mug on it.
- Background. Rotate through 2 or 3 table surfaces or tabletop covers.
- Instruction phrasing. "Pick the red cube and place it in the bowl" / "put the red cube into the blue bowl" / "move the red cube to the bowl." Three phrasings per task is a good minimum.
Deep dive: from recording to training in one afternoon
The whole point of recording a LeRobot dataset is to train a policy on it. The baseline recipe once recording is done: lerobot-train policy=act env=real dataset_repo_id=<your-username>/red_cube_blue_bowl. On a single GPU, an ACT policy fits 50 episodes in about 45 minutes to 2 hours. You should see a steadily decreasing loss and action-space error. Evaluate on the real robot by running the trained checkpoint with lerobot-eval. If your dataset was clean, you should see 70%+ success on the trained task with 50 episodes, and 85%+ with 100.
Deep dive: when to stop recording and start training
Teams frequently over-record. The right heuristic: train a baseline ACT policy after every 25 episodes, measure success, and stop when the success curve flattens. You will often hit diminishing returns around 80 to 150 episodes for a single task. Spending that extra recording time on a different task instead produces a better multi-task policy.
Frequently asked questions
Can I use my own custom robot with LeRobot? Yes. Implement a Python class that matches the LeRobot Robot interface — connect, disconnect, read joint positions, send actions, read camera frames. A few hundred lines typically.
How big is a typical 50-episode dataset? Roughly 1 to 5 GB depending on camera count and resolution. Easy to host on the Hub free tier.
Can I record simulation data? Yes — LeRobot supports sim envs (ALOHA, PushT, Xarm). Sim data is great for pretraining baselines.
What about proprioception beyond joint angles? LeRobot supports arbitrary state dimensions. Declare them in the dataset info and the loader handles the rest.