Testing and Validation in Simulation

Introduction

Simulation's greatest value lies in systematic testing and validation of robot behaviors before hardware deployment. This section teaches you how to design test scenarios, log simulation data, measure performance, and understand the sim-to-real gap—the differences between simulated and real-world behavior.

Effective testing in simulation allows you to:

Validate algorithms across hundreds of scenarios impossible to test on real hardware
Catch bugs early before they damage expensive robots
Optimize parameters through automated experimentation
Build confidence that code will work on real robots
Document performance with reproducible metrics

Why Testing in Simulation Matters

Consider the cost of untested deployment:

Development Stage	Cost of Bug Discovery
Simulation testing	$0 (just time)
Lab testing with robot	$100-1,000 (damage, repairs)
Field testing	$10,000+ (robot replacement, lost research time)
Production deployment	$100,000+ (recalls, reputation damage)

Industry practice: Leading robotics companies (Boston Dynamics, iRobot, Waymo) run millions of simulated hours before each hardware deployment.

Types of Simulation Tests

1. Unit Tests: Component Validation

Test individual components in isolation.

Example: Test LIDAR sensor plugin

Test: Verify LIDAR detects a wall at known distance

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import LaserScan
import pytest

class LidarTest(Node):
    def __init__(self):
        super().__init__('lidar_test')
        self.scan_data = None
        self.subscription = self.create_subscription(
            LaserScan, '/scan', self.scan_callback, 10)

    def scan_callback(self, msg):
        self.scan_data = msg

def test_lidar_detects_wall():
    """Test LIDAR detects wall at 2.0m in front of robot"""
    rclpy.init()
    node = LidarTest()

    # Wait for data
    timeout = 5.0
    start = node.get_clock().now()
    while node.scan_data is None:
        rclpy.spin_once(node, timeout_sec=0.1)
        if (node.get_clock().now() - start).nanoseconds / 1e9 > timeout:
            pytest.fail("No LIDAR data received")

    # Check front reading (index 0 = straight ahead)
    front_range = node.scan_data.ranges[0]
    expected_distance = 2.0
    tolerance = 0.05  # 5cm

    assert abs(front_range - expected_distance) < tolerance, \
        f"LIDAR reading {front_range}m, expected {expected_distance}m"

    node.destroy_node()
    rclpy.shutdown()

Run with pytest:

# Launch Gazebo with test world (robot 2m from wall)
ros2 launch chapter3_simulation lidar_test.launch.py &

# Run test
pytest test_lidar.py

# Cleanup
killall gzserver gzclient

2. Integration Tests: Multi-Component Behavior

Test how components work together.

Example: Navigation stack (LIDAR + odometry + planner)

Test: Robot navigates around obstacle to goal

import rclpy
from rclpy.node import Node
from geometry_msgs.msg import PoseStamped, Pose
from nav_msgs.msg import Odometry
import math

class NavigationTest(Node):
    def __init__(self):
        super().__init__('navigation_test')
        self.goal_pub = self.create_publisher(PoseStamped, '/goal_pose', 10)
        self.odom_sub = self.create_subscription(Odometry, '/odom', self.odom_callback, 10)
        self.current_pose = None

    def odom_callback(self, msg):
        self.current_pose = msg.pose.pose

    def send_goal(self, x, y):
        goal = PoseStamped()
        goal.header.frame_id = 'map'
        goal.header.stamp = self.get_clock().now().to_msg()
        goal.pose.position.x = x
        goal.pose.position.y = y
        goal.pose.orientation.w = 1.0
        self.goal_pub.publish(goal)

    def distance_to_goal(self, goal_x, goal_y):
        if self.current_pose is None:
            return float('inf')
        dx = self.current_pose.position.x - goal_x
        dy = self.current_pose.position.y - goal_y
        return math.sqrt(dx**2 + dy**2)

def test_navigation_to_goal():
    """Test robot navigates to goal (5.0, 5.0) within 60 seconds"""
    rclpy.init()
    node = NavigationTest()

    goal_x, goal_y = 5.0, 5.0
    node.send_goal(goal_x, goal_y)

    # Wait for robot to reach goal
    timeout = 60.0  # seconds
    start_time = node.get_clock().now()

    while True:
        rclpy.spin_once(node, timeout_sec=0.1)

        distance = node.distance_to_goal(goal_x, goal_y)
        if distance < 0.3:  # Within 30cm of goal
            print(f"Success! Reached goal in {(node.get_clock().now() - start_time).nanoseconds / 1e9:.2f}s")
            break

        elapsed = (node.get_clock().now() - start_time).nanoseconds / 1e9
        if elapsed > timeout:
            pytest.fail(f"Failed to reach goal in {timeout}s. Final distance: {distance:.2f}m")

    node.destroy_node()
    rclpy.shutdown()

3. Scenario Tests: Real-World Situations

Test complete robot behaviors in realistic scenarios.

Example scenarios:

Door crossing: Robot approaches door, opens it, passes through
Human avoidance: Robot detects approaching human and yields
Object pickup: Robot localizes object, approaches, grasps, delivers
Failure recovery: Robot detects low battery, returns to charging station

Scenario specification (YAML):

# scenarios/door_crossing.yaml
scenario:
  name: "Door Crossing Test"
  description: "Robot must detect closed door, open it, and pass through"

  world: "worlds/office_with_door.world"

  robot:
    spawn_pose: [0, 0, 0, 0, 0, 0]  # x, y, z, roll, pitch, yaw

  door:
    initial_state: "closed"
    position: [3.0, 0, 0]

  success_criteria:
    - robot_reaches_waypoint: [6.0, 0.0]  # Beyond door
    - max_time: 120  # seconds
    - door_opened: true

  failure_conditions:
    - collision_with_door
    - timeout

4. Stress Tests: Edge Cases and Limits

Test robot behavior at operational limits.

Examples:

Maximum payload (can manipulator lift 5kg? 10kg?)
Extreme speeds (what happens at max velocity?)
Sensor occlusion (navigate with 50% LIDAR blocked)
Multi-robot coordination (10 robots in same area)

Stress test template:

def test_max_payload():
    """Test gripper can lift 5kg object without dropping"""
    # Spawn 5kg box
    # Command grasp
    # Lift to 1m height
    # Hold for 10 seconds
    # Verify box doesn't slip (distance from gripper < threshold)

Designing Test Worlds

Create specialized worlds for targeted testing.

Minimal Test World

Purpose: Fast, isolated tests of specific features

<?xml version="1.0"?>
<sdf version="1.6">
  <world name="minimal_test">

    <physics type="ode">
      <max_step_size>0.001</max_step_size>
      <real_time_factor>1.0</real_time_factor>
    </physics>

    <include><uri>model://sun</uri></include>
    <include><uri>model://ground_plane</uri></include>

    <!-- Single obstacle for testing -->
    <model name="test_wall">
      <static>true</static>
      <pose>2.0 0 0.5 0 0 0</pose>
      <link name="link">
        <collision name="collision">
          <geometry>
            <box><size>0.1 2.0 1.0</size></box>
          </geometry>
        </collision>
        <visual name="visual">
          <geometry>
            <box><size>0.1 2.0 1.0</size></box>
          </geometry>
        </visual>
      </link>
    </model>

  </world>
</sdf>

Usage: Test LIDAR detection, obstacle avoidance, single feature validation

Cluttered Environment

Purpose: Test navigation in complex spaces

<!-- Add multiple obstacles -->
<include>
  <uri>model://cafe_table</uri>
  <pose>2 1 0 0 0 0</pose>
</include>

<include>
  <uri>model://cafe_table</uri>
  <pose>2 -1 0 0 0 0.5</pose>
</include>

<include>
  <uri>model://bookshelf</uri>
  <pose>4 0 0 0 0 1.57</pose>
</include>

<!-- Random boxes -->
<model name="box1">
  <pose>1.5 0.5 0.25 0 0 0.3</pose>
  <static>true</static>
  <link name="link">
    <collision name="collision">
      <geometry><box><size>0.5 0.5 0.5</size></box></geometry>
    </collision>
    <visual name="visual">
      <geometry><box><size>0.5 0.5 0.5</size></box></geometry>
    </visual>
  </link>
</model>

Procedurally Generated Worlds

For large-scale testing, generate worlds programmatically:

import random

def generate_random_obstacles(num_obstacles, area_size):
    """Generate SDF with random obstacles"""
    sdf_header = '<?xml version="1.0"?>\n<sdf version="1.6">\n  <world name="random_world">\n'
    sdf_footer = '  </world>\n</sdf>'

    obstacles = ""
    for i in range(num_obstacles):
        x = random.uniform(-area_size/2, area_size/2)
        y = random.uniform(-area_size/2, area_size/2)
        z = 0.25
        width = random.uniform(0.3, 1.0)
        height = random.uniform(0.3, 1.0)

        obstacles += f"""
    <model name="obstacle_{i}">
      <static>true</static>
      <pose>{x} {y} {z} 0 0 0</pose>
      <link name="link">
        <collision name="collision">
          <geometry><box><size>{width} {width} {height}</size></box></geometry>
        </collision>
        <visual name="visual">
          <geometry><box><size>{width} {width} {height}</size></box></geometry>
        </visual>
      </link>
    </model>
"""

    return sdf_header + obstacles + sdf_footer

# Generate 50 obstacles in 20x20m area
world_sdf = generate_random_obstacles(50, 20)
with open('worlds/random_test.world', 'w') as f:
    f.write(world_sdf)

Data Logging and Analysis

Gazebo Built-in Logging

Record complete simulation state:

# Start recording
gazebo worlds/test_world.world --record

# Simulation runs, data saved to ~/.gazebo/log/<date-time>/

# Replay (opens Gazebo GUI)
gazebo --playback ~/.gazebo/log/2024-01-15T10-30-45.123456/gzserver/state.log

Logged data: All model poses, velocities, sensor outputs, physics state

Use case: Reproduce bugs, share test runs, analyze failure modes

ROS Bag Recording

Record ROS 2 topics for offline analysis:

# Record all topics
ros2 bag record -a

# Record specific topics
ros2 bag record /scan /odom /camera/image_raw /imu/data

# Record to specific file
ros2 bag record -o navigation_test_01 /scan /odom /cmd_vel

Playback:

ros2 bag play navigation_test_01

# In another terminal, echo topic to verify
ros2 topic echo /scan

Analysis with Python:

from rosbag2_py import SequentialReader, StorageOptions, ConverterOptions
from rclpy.serialization import deserialize_message
from sensor_msgs.msg import LaserScan

# Open bag
storage_options = StorageOptions(uri='navigation_test_01', storage_id='sqlite3')
converter_options = ConverterOptions(input_serialization_format='cdr', output_serialization_format='cdr')

reader = SequentialReader()
reader.open(storage_options, converter_options)

# Read messages
while reader.has_next():
    topic, data, timestamp = reader.read_next()
    if topic == '/scan':
        msg = deserialize_message(data, LaserScan)
        print(f"Time: {timestamp}, Min range: {min(msg.ranges):.2f}m")

Custom Metrics Logging

Log application-specific metrics:

import csv
from datetime import datetime

class PerformanceLogger:
    def __init__(self, filename):
        self.filename = filename
        self.file = open(filename, 'w', newline='')
        self.writer = csv.writer(self.file)
        self.writer.writerow(['timestamp', 'distance_to_goal', 'speed', 'cpu_usage'])

    def log(self, distance, speed, cpu):
        timestamp = datetime.now().isoformat()
        self.writer.writerow([timestamp, distance, speed, cpu])
        self.file.flush()

    def close(self):
        self.file.close()

# Usage in ROS node
logger = PerformanceLogger('test_results.csv')

def control_loop():
    # ... compute metrics ...
    logger.log(distance_to_goal, current_speed, psutil.cpu_percent())

Measuring Performance

Define quantitative success metrics:

Metric	Formula	Target
Success rate	`successful_runs / total_runs`	>95%
Time to goal	Average seconds to reach goal	<60s
Path efficiency	`path_length / optimal_length`	<1.2 (within 20% of optimal)
Obstacle clearance	Min distance to obstacles	>0.3m
Energy consumption	Integral of motor power	<500 Wh

Example calculation:

import numpy as np

def calculate_path_efficiency(recorded_poses, goal_pose):
    """Calculate path efficiency ratio"""
    # Path length from recorded poses
    path_length = 0.0
    for i in range(len(recorded_poses) - 1):
        dx = recorded_poses[i+1][0] - recorded_poses[i][0]
        dy = recorded_poses[i+1][1] - recorded_poses[i][1]
        path_length += np.sqrt(dx**2 + dy**2)

    # Optimal (straight-line) distance
    start = recorded_poses[0]
    dx = goal_pose[0] - start[0]
    dy = goal_pose[1] - start[1]
    optimal_length = np.sqrt(dx**2 + dy**2)

    efficiency = path_length / optimal_length
    return efficiency

# Example: Load from CSV
poses = np.loadtxt('robot_path.csv', delimiter=',')  # [[x1, y1], [x2, y2], ...]
goal = [5.0, 5.0]
eff = calculate_path_efficiency(poses, goal)
print(f"Path efficiency: {eff:.2f} (1.0 = optimal, lower is better)")

Manipulation Performance

Metric	Definition	Target
Grasp success rate	`successful_grasps / total_attempts`	>90%
Grasp time	Time from detection to stable grasp	<10s
Place accuracy	Distance from target placement pose	<2cm
Drop rate	Dropped objects during transport	<5%

Sensor Performance

Metric	Definition	Target
Detection rate	Objects detected / objects present	>95%
False positive rate	False detections / total detections	<5%
Localization error	Distance between estimated and true pose	<5cm
Latency	Time from sensor capture to processed output	<100ms

Understanding the Sim-to-Real Gap

Sim-to-Real Gap: Differences between simulated and real-world robot behavior.

Sources of Sim-to-Real Gap

Category	Simulation	Reality	Impact
Sensors	Perfect geometry, tunable noise	Unique noise patterns, calibration errors	High
Actuators	Instant torque response	Backlash, compliance, delays	Medium-High
Physics	Idealized friction, perfect solvers	Variable surfaces, complex contacts	High
Perception	Clean textures, controlled lighting	Shadows, glare, texture variation	High
Timing	Deterministic, repeatable	Jitter, latency, non-deterministic	Medium

Minimizing the Gap

1. Add Realistic Sensor Noise

Don't use perfect sensors. Add noise matching real hardware:

<sensor name="camera" type="camera">
  <camera>
    <noise>
      <type>gaussian</type>
      <mean>0.0</mean>
      <stddev>0.007</stddev>  <!-- Match real camera specification -->
    </noise>
  </camera>
</sensor>

Calibrate from real data:

Capture 100 images of static scene with real camera
Compute pixel intensity standard deviation
Use this value for <stddev> in simulation

2. Model Actuator Dynamics

Add realistic delays and saturation:

class RealisticMotorController:
    def __init__(self, max_torque, response_time):
        self.max_torque = max_torque
        self.response_time = response_time
        self.current_torque = 0.0

    def command(self, desired_torque):
        # Saturate to max torque
        desired_torque = np.clip(desired_torque, -self.max_torque, self.max_torque)

        # First-order lag (motor response time)
        alpha = 1.0 / self.response_time
        self.current_torque += alpha * (desired_torque - self.current_torque) * dt

        return self.current_torque

3. Vary Physics Parameters

Test with parameter ranges, not single values:

# Test with multiple friction coefficients
for friction in [0.6, 0.8, 1.0, 1.2]:
    # Update world with new friction
    # Run test
    # Log results

This reveals sensitivity to parameter uncertainty.

4. Validate Against Real Data

Gold standard: Compare simulation to real robot:

Run identical task on real robot (e.g., drive 5m straight)
Record sensor data (odometry, IMU)
Run same task in simulation
Compare trajectories, velocities, accelerations
Tune simulation parameters to minimize differences

Example comparison:

import matplotlib.pyplot as plt

# Load real robot data
real_time, real_position = load_real_data('real_robot_run.csv')

# Load simulation data
sim_time, sim_position = load_sim_data('sim_robot_run.csv')

# Plot comparison
plt.figure()
plt.plot(real_time, real_position, label='Real Robot', linewidth=2)
plt.plot(sim_time, sim_position, label='Simulation', linestyle='--', linewidth=2)
plt.xlabel('Time (s)')
plt.ylabel('Position (m)')
plt.legend()
plt.title('Sim-to-Real Position Comparison')
plt.grid(True)
plt.savefig('sim_vs_real.png')
plt.show()

# Compute error metric
error = np.mean(np.abs(real_position - sim_position))
print(f"Mean absolute error: {error:.3f}m")

Automated Testing Workflows

Continuous Integration (CI) with Simulation

Integrate simulation tests into CI/CD pipeline:

GitHub Actions example (.github/workflows/gazebo_tests.yml):

name: Gazebo Simulation Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-22.04

    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      - name: Install ROS 2 Humble
        run: |
          sudo apt update
          sudo apt install -y ros-humble-desktop gazebo

      - name: Build workspace
        run: |
          source /opt/ros/humble/setup.bash
          colcon build

      - name: Run simulation tests
        run: |
          source install/setup.bash
          pytest tests/test_*.py --junit-xml=test_results.xml

      - name: Publish test results
        uses: EnricoMi/publish-unit-test-result-action@v1
        if: always()
        with:
          files: test_results.xml

Every code push triggers automated simulation tests.

Batch Testing Script

Run multiple test scenarios overnight:

#!/bin/bash
# batch_test.sh - Run all test scenarios

SCENARIOS=("door_crossing" "obstacle_avoidance" "object_pickup" "failure_recovery")

for scenario in "${SCENARIOS[@]}"; do
    echo "Running scenario: $scenario"

    # Launch Gazebo with scenario world
    ros2 launch chapter3_simulation $scenario.launch.py &
    GAZEBO_PID=$!

    # Wait for Gazebo to initialize
    sleep 10

    # Run test
    pytest tests/test_$scenario.py --junit-xml=results/test_$scenario.xml

    # Kill Gazebo
    kill $GAZEBO_PID
    killall gzserver gzclient

    # Wait before next test
    sleep 5
done

echo "All tests complete. Results in results/"

Run overnight:

nohup ./batch_test.sh > batch_test.log 2>&1 &

Example: Complete Validation Workflow

Goal: Validate mobile robot navigation before hardware deployment

Step 1: Define Acceptance Criteria

acceptance_criteria:
  navigation:
    success_rate: 0.95  # 95% of runs reach goal
    max_time: 60  # seconds
    obstacle_clearance: 0.30  # meters

  safety:
    max_velocity: 1.0  # m/s
    min_obstacle_distance: 0.25  # m
    collision_tolerance: 0  # Zero collisions allowed

  performance:
    path_efficiency: 1.3  # Within 30% of optimal path
    avg_time_to_goal: 45  # seconds

Step 2: Create Test Scenarios

Scenario 1: Empty 10x10m room, goal at (5,5)
Scenario 2: Room with 5 randomly placed obstacles
Scenario 3: Narrow corridor (0.8m wide)
Scenario 4: Dynamic obstacle (moving person)

Step 3: Run Tests (100 runs each)

for i in {1..100}; do
    ros2 launch chapter3_simulation scenario1_empty_room.launch.py seed:=$i
    pytest tests/test_navigation.py --scenario=1 --run=$i
done

Step 4: Analyze Results

import pandas as pd

# Load all test results
results = pd.read_csv('all_results.csv')

# Compute metrics
success_rate = results['success'].mean()
avg_time = results[results['success']]['time_to_goal'].mean()
min_clearance = results['min_obstacle_distance'].min()

print(f"Success Rate: {success_rate*100:.1f}%")
print(f"Avg Time: {avg_time:.1f}s")
print(f"Min Clearance: {min_clearance:.2f}m")

# Check against criteria
assert success_rate >= 0.95, "Success rate below 95%"
assert avg_time <= 45, "Average time exceeds 45s"
assert min_clearance >= 0.25, "Obstacle clearance violated"

Step 5: Sim-to-Real Validation

Run 10 trials on real robot, compare to simulation:

# Statistical comparison
from scipy import stats

sim_times = [42, 38, 45, 40, 43, ...]  # From simulation
real_times = [45, 50, 48, 52, 47, ...]  # From real robot

# T-test: Are means significantly different?
t_stat, p_value = stats.ttest_ind(sim_times, real_times)

if p_value < 0.05:
    print(f"WARNING: Sim and real times differ significantly (p={p_value:.3f})")
else:
    print(f"Sim and real times are consistent (p={p_value:.3f})")

Summary

Effective simulation testing requires:

Comprehensive test coverage: Unit, integration, scenario, and stress tests
Quantitative metrics: Success rate, time, accuracy, safety margins
Realistic worlds: From minimal test environments to complex scenarios
Data logging: ROS bags, Gazebo logs, custom metrics
Sim-to-real awareness: Add noise, vary parameters, validate against real data
Automation: CI/CD integration, batch testing, overnight runs

Simulation testing is not a replacement for real-world validation, but a powerful tool to reduce risk and accelerate development.

Review Questions

1. What are the four main types of simulation tests, and when would you use each?

Answer:

Unit Tests: Test individual components in isolation (e.g., LIDAR sensor, motor controller)
- When: During component development, to verify single features work correctly
- Example: Verify LIDAR detects wall at known distance
Integration Tests: Test how multiple components work together (e.g., LIDAR + planner + controller)
- When: After unit tests pass, to verify components interact correctly
- Example: Test navigation stack reaches goal using LIDAR and odometry
Scenario Tests: Test complete robot behaviors in realistic situations
- When: Before deployment, to validate end-to-end functionality
- Example: Robot opens door and passes through, or avoids dynamic obstacles
Stress Tests: Test robot at operational limits and edge cases
- When: To understand failure modes and safety margins
- Example: Maximum payload, extreme speeds, degraded sensors

2. Explain three strategies to minimize the sim-to-real gap.

Answer: Three effective strategies:

Add realistic sensor noise: Don't use perfect sensors. Calibrate noise models from real hardware (measure pixel stddev, LIDAR variance, IMU drift) and configure simulation to match. Example:
```
<noise><type>gaussian</type><stddev>0.007</stddev></noise>
```
Model actuator dynamics: Add realistic delays, saturation, and response times. Real motors don't instantly reach commanded torque—model first-order lag or more complex dynamics based on motor datasheets.
Validate against real data: Run identical tasks on real robot and in simulation, then compare trajectories/sensor data. Tune simulation parameters (friction, inertia, time step) to minimize differences. Use statistical tests to verify consistency.

Other valid strategies: vary physics parameters to test robustness, use high-fidelity contact models, incorporate systematic errors (sensor bias).

3. What data should you log during a navigation test, and why?

Answer: Essential data to log:

Robot pose (/odom or /tf): Track actual path taken, compute path efficiency, detect deviations
Goal pose: Record intended destination for comparison
Sensor data (/scan, /camera): Analyze what robot perceived, debug perception failures
Commands (/cmd_vel): Understand controller decisions, identify erratic behavior
Timestamps: Compute time to goal, detect delays, ensure proper sequencing
Performance metrics: Distance to goal over time, obstacle clearance, velocity profiles
Events: Goal reached, collision detected, planner failures (log warnings/errors)

Why: Enables offline analysis, reproducing bugs, computing quantitative metrics, and comparing sim vs. real performance. Without logging, you can't systematically improve or validate algorithms.

4. A test shows 85% success rate (below the 95% target). What are three ways to diagnose the failures?

Answer: Three diagnostic approaches:

Analyze failure cases specifically: Filter logged data to failed runs only. Look for patterns:
- Do failures occur in specific world locations?
- At specific times (e.g., during tight turns)?
- With specific sensor readings (e.g., LIDAR detecting false obstacle)?
- Visualize failed paths in RViz overlaid on map
Replay failed runs: Use Gazebo playback (gazebo --playback log.file) or ROS bag replay to watch failures in slow motion. Inspect:
- When did robot deviate from expected behavior?
- What was the sensor data at failure moment?
- Did controller issue erratic commands?
Add instrumentation: Insert additional logging/visualization in suspected problem areas:
- Log internal planner state (candidate paths, cost maps)
- Visualize intermediate computation (detected obstacles, planned trajectory)
- Add breakpoints or step-through debugging in critical code sections

Compare successful vs. failed runs to identify differentiating factors (e.g., "failures always occur when obstacle clearance < 0.4m").

5. Why is automated testing in CI/CD important for robotics, and what challenges does it face?

Answer:

Importance:

Prevent regressions: Every code change is automatically tested; bugs caught before merging
Faster development: Developers get immediate feedback (minutes, not hours)
Confidence: Large test suite running on every commit ensures code quality
Documentation: Tests serve as executable specifications of expected behavior
Collaboration: Team members can safely modify code knowing tests will catch breakage

Challenges:

Computational cost: Running Gazebo is expensive; CI servers may lack GPUs or sufficient CPU
- Solution: Headless mode (gzserver only), limit test duration, use cloud compute
Non-determinism: Physics simulation can have subtle variations between runs
- Solution: Use fixed random seeds, set real_time_factor=0 (run as fast as possible), increase tolerance in assertions
Long test times: Complex scenarios can take minutes-hours
- Solution: Parallelize tests, run subset on every commit and full suite nightly
Setup complexity: Requires ROS 2, Gazebo, custom packages installed on CI server
- Solution: Use Docker containers with pre-installed dependencies

Despite challenges, automated simulation testing is essential for production robotics development.

Introduction​

Why Testing in Simulation Matters​

Types of Simulation Tests​

1. Unit Tests: Component Validation​

2. Integration Tests: Multi-Component Behavior​

3. Scenario Tests: Real-World Situations​

4. Stress Tests: Edge Cases and Limits​

Designing Test Worlds​

Minimal Test World​

Cluttered Environment​

Procedurally Generated Worlds​

Data Logging and Analysis​

Gazebo Built-in Logging​

ROS Bag Recording​

Custom Metrics Logging​

Measuring Performance​

Navigation Performance​

Manipulation Performance​

Sensor Performance​

Understanding the Sim-to-Real Gap​

Sources of Sim-to-Real Gap​

Minimizing the Gap​

1. Add Realistic Sensor Noise​

2. Model Actuator Dynamics​

3. Vary Physics Parameters​

4. Validate Against Real Data​

Automated Testing Workflows​

Continuous Integration (CI) with Simulation​

Batch Testing Script​

Example: Complete Validation Workflow​

Step 1: Define Acceptance Criteria​

Step 2: Create Test Scenarios​

Step 3: Run Tests (100 runs each)​

Step 4: Analyze Results​

Step 5: Sim-to-Real Validation​

Summary​

Review Questions​