How to Set Up an ALOHA-Style Bimanual Teleop Rig

ALOHA (A Low-cost Open-source Hardware System for Bimanual Teleoperation) from Stanford has become the reference bimanual teleop platform for imitation learning. This tutorial walks through building your own: four WidowX 250 arms (two leader, two follower), three cameras, ROS 2 Humble, and leader-follower sync. Budget one full day for a first build.

Teleoperation Total time: about 8 hours Difficulty: Advanced Updated April 2026

What you will accomplish

At the end of this tutorial you will have a working ALOHA-style bimanual teleoperation rig: operator holds two leader arms, two follower arms mirror the motion in real time, and three cameras record the scene from ceiling, left wrist, and right wrist. You can record an episode to disk and feed it into LeRobot or the ACT / Diffusion Policy training pipeline.

Why ALOHA? Bimanual manipulation unlocks tasks that are genuinely hard to do with a single arm — untying a twist tie, pouring from one vessel to another, routing cable. The ALOHA form factor of cheap arms on a shared frame with good camera coverage was the breakthrough that made bimanual imitation learning practical on a lab budget.

Prerequisites

The steps

  1. Order parts and plan the workspace

    Core parts list for a full rig:

    • 4x Trossen WidowX 250 6-DOF arms (or the full Trossen ALOHA 2 kit if you want everything pre-integrated).
    • 3x cameras — Intel RealSense D405 is the ALOHA reference; Logitech C922 also works for RGB-only.
    • 80/20 aluminum extrusion and brackets for the frame.
    • Workstation with RTX 3080+ GPU (data collection is CPU/IO bound; training is GPU bound).
    • Powered USB 3.0 hub with at least 7 ports (4 arms + 3 cameras).

    Plan the workspace so leader arms sit in front of the operator and follower arms face the task area, with a shared table surface. If you are new to this, start with a Trossen kit — you can browse comparable bimanual kits on our store.

  2. Build the frame and mount the arms

    Assemble the aluminum extrusion frame. The reference ALOHA frame separates the operator zone from the task zone by about 50 cm so the operator's hands do not collide with the followers' workspace. Mount each arm firmly to the frame — use the official Trossen base plates and torque all M5 bolts to spec (around 5 N·m).

    Label every arm clearly: leader_left, leader_right, follower_left, follower_right. You will thank yourself an hour from now.

  3. Wire power and USB

    Each WidowX 250 needs its own 12 V / 5 A power supply. Plug each arm's U2D2 USB-serial adapter into the powered hub. Use a powered hub with per-port current isolation — the Dynamixel motors draw enough current that a cheap unpowered hub will brownout.

    Safety: emergency stop. Wire an external kill switch into the 12 V line so you can cut power to all four arms with one button. You will need it.
  4. Install ROS 2 Humble on the workstation

    ALOHA is a ROS 2 Humble stack on Ubuntu 22.04. Install via the official apt path:

    sudo apt update && sudo apt install locales
    sudo locale-gen en_US en_US.UTF-8
    sudo update-locale LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8
    
    sudo apt install software-properties-common
    sudo add-apt-repository universe
    sudo apt update && sudo apt install curl -y
    sudo curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key -o /usr/share/keyrings/ros-archive-keyring.gpg
    
    # Follow the ROS 2 Humble install docs to add the apt repo, then:
    sudo apt install ros-humble-desktop

    Source the environment: source /opt/ros/humble/setup.bash. Add it to ~/.bashrc so it always loads.

  5. Install the Interbotix ROS 2 stack

    Trossen maintains the Interbotix ROS 2 stack for WidowX. Follow the official Trossen / Interbotix install instructions — the general pattern is:

    mkdir -p ~/interbotix_ws/src
    cd ~/interbotix_ws/src
    git clone https://github.com/Interbotix/interbotix_ros_manipulators.git -b humble
    cd ..
    rosdep install --from-paths src --ignore-src -r -y
    colcon build
    source install/setup.bash

    Verify you can talk to one arm:

    ros2 launch interbotix_xsarm_control xsarm_control.launch.py robot_model:=wx250s

    The arm should torque on. Do not place your hand in the workspace during this test.

  6. Mount and enumerate cameras

    Mount the top-down camera on the frame looking down at the workspace, and wrist cameras on each follower arm just above the gripper. Connect all three to the powered USB hub.

    Critical: create udev rules so cameras always enumerate at the same /dev/video* path. Otherwise you will discover halfway through a recording session that left and right wrist cameras have swapped. Example rule:

    # /etc/udev/rules.d/99-aloha-cameras.rules
    SUBSYSTEM=="video4linux", ATTRS{serial}=="", SYMLINK+="aloha_wrist_left"
    SUBSYSTEM=="video4linux", ATTRS{serial}=="", SYMLINK+="aloha_wrist_right"
    SUBSYSTEM=="video4linux", ATTRS{serial}=="", SYMLINK+="aloha_top"

    Reload with sudo udevadm control --reload && sudo udevadm trigger.

  7. Run leader-follower teleop

    Launch the bimanual teleop node. The Interbotix ALOHA repo provides one; the exact launch file moves across releases, but the typical pattern is:

    ros2 launch interbotix_xsarm_dual aloha_bringup.launch.py \
      leader_left:=wx250s leader_right:=wx250s \
      follower_left:=wx250s follower_right:=wx250s

    Gravity-compensate the leaders, torque-enable the followers, and set the control loop to 100 Hz minimum. Grab a leader by the wrist, move it — the follower should mirror in real time with less than 30 ms latency.

  8. Record your first episode

    With teleop running, launch a recorder node that subscribes to joint states, follower commands, and all three camera topics, then writes a timestamped episode to disk:

    ros2 run aloha_data_collection record_episode \
      --task pick_and_place \
      --duration 20 \
      --output ~/aloha_episodes/

    Review the recording with the visualizer. If the images, joint states, and actions look aligned, you are ready for a full recording session — typically 50 to 200 episodes per task. Next stop is our LeRobot recording tutorial or VLA fine-tuning.

What to do next

Once your rig is recording cleanly, the next investments pay off quickly: (1) add a fourth camera at a side-front angle for better depth cues, (2) add force-torque sensors at each wrist for contact-rich tasks, and (3) iterate your task taxonomy. Great bimanual datasets have breadth — 20 tasks, 50 episodes each beats 1 task with 1000 episodes for generalization.

Common failure modes

Follower lags leader: loop rate too low, or you are running the visualizer on the same ROS node. Separate the teleop control loop onto its own executor.

One arm drops out mid-recording: USB brownout. Move to a better-powered hub.

Wrist cameras swap identities after reboot: udev rules missing or not reloaded. See step 6.

Operator fatigue: real. 30-minute sessions, mandatory breaks. Your data quality degrades with operator fatigue.

Deep dive: kit vs DIY

The question every lab asks: buy the Trossen ALOHA 2 kit, or source the parts and build from scratch? Purely on hardware cost, DIY saves 10 to 15 percent. On time-to-first-episode, the pre-integrated kit is almost always cheaper by the time you include engineer hours — budget 40 to 60 engineer-hours for a clean DIY build, versus roughly 8 hours to unpack and commission a kit. If this is your first rig, buy the kit and learn the integration as you go. If this is your third rig and you have specific modifications in mind (different gripper, different camera layout), DIY gives you that freedom.

A middle path that often makes sense: buy the pre-built arms and leader-follower cabling but fabricate your own frame. The frame is the easiest part to customize and the part you most often want to modify for your specific task (bench height, camera angles, dual-station layout).

Deep dive: the subtle stuff that wrecks bimanual data

Things that look fine individually but ruin datasets in aggregate:

Deep dive: dataset formats for bimanual

Bimanual episodes typically pack dual 7-DOF actions (6 joints + gripper per arm) into a 14-dimensional action vector. Most downstream stacks — LeRobot, OpenVLA, HuggingFace datasets — accept arbitrary action dimensionality, but you need to be explicit about action ordering in your metadata. The convention we recommend: [left_j0..j5, left_gripper, right_j0..j5, right_gripper]. Document it in a dataset card so anyone training on your data knows what the channels mean.

Frequently asked questions

Do I need ROS 2 for ALOHA? The official stack uses ROS 2 Humble. You can run bimanual teleop without ROS by writing your own serial bus coordinator, but it is a lot of work for marginal benefit.

Can I substitute different arms? Yes — as long as the leader-follower geometry matches. Teams have built ALOHA-style rigs with Koch v1.1, Moss, and SO-ARM bimanual variants for lower budgets.

How many episodes to train a policy? For ACT on a bimanual task, 50 is minimum, 100-200 sweet spot.

Force feedback on the leader? The standard ALOHA rig is non-haptic. Haptic variants exist but add cost and complexity.

Related tutorials and resources