How to Calibrate Cameras on Unitree G1

Every serious manipulation or locomotion pipeline on the Unitree G1 humanoid starts with good camera calibration. This tutorial walks through intrinsic and extrinsic calibration of the G1 head camera array using a ChArUco board and OpenCV, saving results to YAML, and validating with a reprojection-error sanity check.

Humanoid Total time: about 90 minutes Difficulty: Intermediate Updated April 2026

What you will accomplish

By the end of this tutorial you will have per-camera intrinsics (camera matrix and distortion coefficients) for each of the G1 head cameras, extrinsics between cameras in the head rig, and optionally the extrinsic from the head camera frame to the G1 base link. All saved to YAML in a format your downstream ROS 2 / LeRobot / OpenVLA stack can consume directly.

Bad calibration is one of the most common root causes of "the policy worked in sim but fails on the real robot" — it corrupts everything downstream: hand-eye coordination, depth estimation, object detection, and camera-to-action policies. Spend the 90 minutes once, then forget about it.

Prerequisites

Unitree G1 with the head camera array powered and streaming (the EDU / Developer variants expose the cameras over the SDK). Check the humanoid comparison if you are still deciding on a platform.
ChArUco board. Print an A2 8x11 board with 40 mm squares and 30 mm ArUco markers on matte paper and mount it on a rigid, flat, non-warping surface (foam board or acrylic).
Workstation with Ubuntu 22.04, Python 3.10, OpenCV 4.8+, and the Unitree G1 SDK installed.
Even ambient lighting. Avoid direct sun or specular reflections on the board.

The steps

Print and mount the ChArUco board

Generate the board PDF:

python -c "
import cv2
aruco = cv2.aruco
d = aruco.getPredefinedDictionary(aruco.DICT_5X5_100)
board = aruco.CharucoBoard((8, 11), 0.040, 0.030, d)
img = board.generateImage((1500, 2100), marginSize=40)
cv2.imwrite('charuco_a2.png', img)
"

Print at actual size (no scaling) on A2 matte paper, verify the printed squares measure exactly 40.0 mm, and mount on a rigid, flat backing. Warp is the enemy.

Set up the G1 and stream cameras

Boot the G1 in tethered mode with the battery safely seated. Start the head camera stream via the Unitree SDK. The exact Python binding is on the unitreerobotics GitHub — the general pattern is a subscriber that yields synchronized numpy frames per camera. Verify you can read frames from all head cameras into Python simultaneously. If one camera drops, fix that before you start capturing.
Capture 40-60 board frames

Move the G1 head (or the board) through a wide range of poses: tilted left, tilted right, close, far, off-axis. OpenCV's calibration is only as good as the diversity of viewpoints you feed it. For stereo extrinsics, capture synchronized frames across both cameras for each pose.
```
import cv2

for i in range(60):
    frame_left, frame_right = camera.grab_sync()
    cv2.imwrite(f'calib/left_{i:03d}.png', frame_left)
    cv2.imwrite(f'calib/right_{i:03d}.png', frame_right)
    input("move board, press enter")
```
Tip: the board should fill at least 30% of the frame, and the corner-most detected markers should span across the full image region. Shots with the board only in the center produce overconfident intrinsics.

Compute intrinsics per camera

Run ChArUco intrinsic calibration per camera:

import cv2, glob, numpy as np

aruco = cv2.aruco
d = aruco.getPredefinedDictionary(aruco.DICT_5X5_100)
board = aruco.CharucoBoard((8, 11), 0.040, 0.030, d)
detector = aruco.CharucoDetector(board)

all_corners, all_ids, img_size = [], [], None
for fn in sorted(glob.glob('calib/left_*.png')):
    img = cv2.imread(fn, cv2.IMREAD_GRAYSCALE)
    img_size = img.shape[::-1]
    corners, ids, _, _ = detector.detectBoard(img)
    if ids is not None and len(ids) > 6:
        all_corners.append(corners)
        all_ids.append(ids)

ret, K, dist, rvecs, tvecs = aruco.calibrateCameraCharuco(
    all_corners, all_ids, board, img_size, None, None)
print('RMS reprojection error:', ret)

Target RMS below 0.5 pixels. Above 1 px indicates bad board flatness, insufficient pose diversity, or out-of-focus frames.

Compute stereo extrinsics

For pairs of cameras (e.g. left + right head cameras on a binocular G1 rig), run cv2.stereoCalibrate using the already-computed intrinsics and synchronized board observations. The output R, T is the rigid transform from left to right camera frames.
```
ret, K1, D1, K2, D2, R, T, E, F = cv2.stereoCalibrate(
    object_points, left_img_points, right_img_points,
    K_left, D_left, K_right, D_right, img_size,
    flags=cv2.CALIB_FIX_INTRINSIC)
```
Compute head-to-base extrinsics (optional)

If your downstream policy needs camera poses in the G1 base link or world frame, chain through the known head kinematics: read the head joint state from the SDK, compose with the camera-to-head-link extrinsic (from the URDF or a one-time hand-eye calibration), and publish a ROS 2 tf. The G1 URDF ships with nominal values that are a good starting point.

Save to YAML

Save in a ROS-compatible YAML format so the standard camera_info loaders work:

import yaml

out = {
    'image_width': 1280,
    'image_height': 720,
    'camera_name': 'g1_head_left',
    'camera_matrix': {'rows': 3, 'cols': 3, 'data': K.flatten().tolist()},
    'distortion_model': 'plumb_bob',
    'distortion_coefficients': {'rows': 1, 'cols': 5, 'data': dist.flatten().tolist()},
}
with open('g1_head_left.yaml', 'w') as f:
    yaml.safe_dump(out, f)

Validate on a test scene

Capture a new ChArUco frame that was not in the calibration set. Detect corners, reproject using your saved intrinsics, and overlay on the image. Mean reprojection error should stay under 0.5 pixels. Also walk the robot through a few different head poses and watch the stereo disparity on a known-distance object — if it agrees with the tape-measured distance to within 1 to 2 cm at 1 m range, your extrinsics are good.

What to do next

Feed your calibrated intrinsics into any downstream policy stack: OpenVLA fine-tuning uses rectified images, LeRobot recording stores the camera_info alongside episodes, and Isaac Lab needs the same intrinsics for sim-to-real transfer. If you are running SLAM or dense depth, good calibration is table stakes.

Common failure modes

High reprojection error: warped board, insufficient pose diversity, or out-of-focus frames. Reshoot with better diversity.

Extrinsics wrong by a few cm: usually a unit bug — square size in meters vs millimeters.

Inconsistent runs: check that head joint state is latched while capturing each frame; motion during capture breaks calibration.

Policy still fails after calibration: verify image ordering (BGR vs RGB) and resize order (crop then resize vs resize then crop) match the policy's training preprocessing.

Deep dive: why ChArUco over plain checkerboard

A plain checkerboard is the textbook calibration target, but it has a well-known failure mode: if any part of the board is occluded, the whole detection fails. ChArUco (a checkerboard with ArUco fiducials inlaid in the white squares) solves this by giving each corner a unique identity — the calibrator can use partial observations, so you can move the board aggressively through diverse poses without wasting capture time on unusable frames. OpenCV's modern CharucoDetector API (4.8+) is the right tool for 2026. Avoid the deprecated functions in older tutorials; the subpixel corner refinement is much better in the new API.

For humanoid head cameras specifically, ChArUco also helps with a subtlety: the head tilts during capture, so your frames alternate between "board takes the whole image" and "board is off to one side." A full checkerboard often falls off the edge and fails to detect; ChArUco just returns fewer corners and keeps going.

Deep dive: how good is good enough?

Reprojection error is the headline number but not the whole story. Targets by use case:

RGB policy training: 0.5 to 1.0 px is fine. Policies are somewhat tolerant of small intrinsics errors, especially if you train and deploy with the same calibration.
Stereo depth or SLAM: under 0.3 px. Extrinsic error compounds through triangulation.
Hand-eye / manipulation: under 0.5 px on intrinsics and under 2 mm on the camera-to-end-effector extrinsic. Consider a dedicated hand-eye calibration routine (e.g. Tsai-Lenz) rather than chaining through the URDF.

Deep dive: re-calibration triggers

A G1 does not need weekly re-calibration, but some events warrant it: (1) any physical impact to the head or camera mount, (2) swapping a camera, (3) thermal drift on a new deployment site (cold lab to warm production floor), and (4) if you see reprojection errors creeping up across weeks of use. Automate a quick ChArUco sanity check as part of your weekly maintenance — 5 frames and a reprojection check catches 90% of calibration drift.

Deep dive: integrating with policy training

Once the calibration YAML is saved, wire it into your downstream stack in three places: (1) the recording pipeline saves the calibration alongside each episode so you can re-rectify later if needed, (2) the policy training preprocessing optionally undistorts or crops according to the intrinsics, and (3) the inference-time preprocessing uses the same parameters as training. Mismatches between training and inference preprocessing are the single most common root cause of "works in notebook, fails on robot" failures.

Frequently asked questions

Can I skip calibration if I am just doing teleop? Yes, teleop does not require calibration. But you lose the ability to later train a pixel-to-action policy on the recorded data.

How long is a calibration valid for? Months, assuming no physical impact to the head.

Is the G1 URDF calibration accurate enough out of the box? Usually within a few millimeters — sufficient for RGB policies, borderline for manipulation. Re-calibrate if you are doing contact-rich tasks.

What about fisheye cameras? Use the omnidir or fisheye modules in OpenCV — the plumb_bob model assumed in this tutorial does not handle fisheye distortion.