Skip to main content

Testing and Validation in Simulation

Introduction

Simulation's greatest value lies in systematic testing and validation of robot behaviors before hardware deployment. This section teaches you how to design test scenarios, log simulation data, measure performance, and understand the sim-to-real gap—the differences between simulated and real-world behavior.

Effective testing in simulation allows you to:

  • Validate algorithms across hundreds of scenarios impossible to test on real hardware
  • Catch bugs early before they damage expensive robots
  • Optimize parameters through automated experimentation
  • Build confidence that code will work on real robots
  • Document performance with reproducible metrics

Why Testing in Simulation Matters

Consider the cost of untested deployment:

Development StageCost of Bug Discovery
Simulation testing$0 (just time)
Lab testing with robot$100-1,000 (damage, repairs)
Field testing$10,000+ (robot replacement, lost research time)
Production deployment$100,000+ (recalls, reputation damage)

Industry practice: Leading robotics companies (Boston Dynamics, iRobot, Waymo) run millions of simulated hours before each hardware deployment.

Types of Simulation Tests

1. Unit Tests: Component Validation

Test individual components in isolation.

Example: Test LIDAR sensor plugin

Test: Verify LIDAR detects a wall at known distance

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import LaserScan
import pytest

class LidarTest(Node):
def __init__(self):
super().__init__('lidar_test')
self.scan_data = None
self.subscription = self.create_subscription(
LaserScan, '/scan', self.scan_callback, 10)

def scan_callback(self, msg):
self.scan_data = msg

def test_lidar_detects_wall():
"""Test LIDAR detects wall at 2.0m in front of robot"""
rclpy.init()
node = LidarTest()

# Wait for data
timeout = 5.0
start = node.get_clock().now()
while node.scan_data is None:
rclpy.spin_once(node, timeout_sec=0.1)
if (node.get_clock().now() - start).nanoseconds / 1e9 > timeout:
pytest.fail("No LIDAR data received")

# Check front reading (index 0 = straight ahead)
front_range = node.scan_data.ranges[0]
expected_distance = 2.0
tolerance = 0.05 # 5cm

assert abs(front_range - expected_distance) < tolerance, \
f"LIDAR reading {front_range}m, expected {expected_distance}m"

node.destroy_node()
rclpy.shutdown()

Run with pytest:

# Launch Gazebo with test world (robot 2m from wall)
ros2 launch chapter3_simulation lidar_test.launch.py &

# Run test
pytest test_lidar.py

# Cleanup
killall gzserver gzclient

2. Integration Tests: Multi-Component Behavior

Test how components work together.

Example: Navigation stack (LIDAR + odometry + planner)

Test: Robot navigates around obstacle to goal

import rclpy
from rclpy.node import Node
from geometry_msgs.msg import PoseStamped, Pose
from nav_msgs.msg import Odometry
import math

class NavigationTest(Node):
def __init__(self):
super().__init__('navigation_test')
self.goal_pub = self.create_publisher(PoseStamped, '/goal_pose', 10)
self.odom_sub = self.create_subscription(Odometry, '/odom', self.odom_callback, 10)
self.current_pose = None

def odom_callback(self, msg):
self.current_pose = msg.pose.pose

def send_goal(self, x, y):
goal = PoseStamped()
goal.header.frame_id = 'map'
goal.header.stamp = self.get_clock().now().to_msg()
goal.pose.position.x = x
goal.pose.position.y = y
goal.pose.orientation.w = 1.0
self.goal_pub.publish(goal)

def distance_to_goal(self, goal_x, goal_y):
if self.current_pose is None:
return float('inf')
dx = self.current_pose.position.x - goal_x
dy = self.current_pose.position.y - goal_y
return math.sqrt(dx**2 + dy**2)

def test_navigation_to_goal():
"""Test robot navigates to goal (5.0, 5.0) within 60 seconds"""
rclpy.init()
node = NavigationTest()

goal_x, goal_y = 5.0, 5.0
node.send_goal(goal_x, goal_y)

# Wait for robot to reach goal
timeout = 60.0 # seconds
start_time = node.get_clock().now()

while True:
rclpy.spin_once(node, timeout_sec=0.1)

distance = node.distance_to_goal(goal_x, goal_y)
if distance < 0.3: # Within 30cm of goal
print(f"Success! Reached goal in {(node.get_clock().now() - start_time).nanoseconds / 1e9:.2f}s")
break

elapsed = (node.get_clock().now() - start_time).nanoseconds / 1e9
if elapsed > timeout:
pytest.fail(f"Failed to reach goal in {timeout}s. Final distance: {distance:.2f}m")

node.destroy_node()
rclpy.shutdown()

3. Scenario Tests: Real-World Situations

Test complete robot behaviors in realistic scenarios.

Example scenarios:

  • Door crossing: Robot approaches door, opens it, passes through
  • Human avoidance: Robot detects approaching human and yields
  • Object pickup: Robot localizes object, approaches, grasps, delivers
  • Failure recovery: Robot detects low battery, returns to charging station

Scenario specification (YAML):

# scenarios/door_crossing.yaml
scenario:
name: "Door Crossing Test"
description: "Robot must detect closed door, open it, and pass through"

world: "worlds/office_with_door.world"

robot:
spawn_pose: [0, 0, 0, 0, 0, 0] # x, y, z, roll, pitch, yaw

door:
initial_state: "closed"
position: [3.0, 0, 0]

success_criteria:
- robot_reaches_waypoint: [6.0, 0.0] # Beyond door
- max_time: 120 # seconds
- door_opened: true

failure_conditions:
- collision_with_door
- timeout

4. Stress Tests: Edge Cases and Limits

Test robot behavior at operational limits.

Examples:

  • Maximum payload (can manipulator lift 5kg? 10kg?)
  • Extreme speeds (what happens at max velocity?)
  • Sensor occlusion (navigate with 50% LIDAR blocked)
  • Multi-robot coordination (10 robots in same area)

Stress test template:

def test_max_payload():
"""Test gripper can lift 5kg object without dropping"""
# Spawn 5kg box
# Command grasp
# Lift to 1m height
# Hold for 10 seconds
# Verify box doesn't slip (distance from gripper < threshold)

Designing Test Worlds

Create specialized worlds for targeted testing.

Minimal Test World

Purpose: Fast, isolated tests of specific features

<?xml version="1.0"?>
<sdf version="1.6">
<world name="minimal_test">

<physics type="ode">
<max_step_size>0.001</max_step_size>
<real_time_factor>1.0</real_time_factor>
</physics>

<include><uri>model://sun</uri></include>
<include><uri>model://ground_plane</uri></include>

<!-- Single obstacle for testing -->
<model name="test_wall">
<static>true</static>
<pose>2.0 0 0.5 0 0 0</pose>
<link name="link">
<collision name="collision">
<geometry>
<box><size>0.1 2.0 1.0</size></box>
</geometry>
</collision>
<visual name="visual">
<geometry>
<box><size>0.1 2.0 1.0</size></box>
</geometry>
</visual>
</link>
</model>

</world>
</sdf>

Usage: Test LIDAR detection, obstacle avoidance, single feature validation

Cluttered Environment

Purpose: Test navigation in complex spaces

<!-- Add multiple obstacles -->
<include>
<uri>model://cafe_table</uri>
<pose>2 1 0 0 0 0</pose>
</include>

<include>
<uri>model://cafe_table</uri>
<pose>2 -1 0 0 0 0.5</pose>
</include>

<include>
<uri>model://bookshelf</uri>
<pose>4 0 0 0 0 1.57</pose>
</include>

<!-- Random boxes -->
<model name="box1">
<pose>1.5 0.5 0.25 0 0 0.3</pose>
<static>true</static>
<link name="link">
<collision name="collision">
<geometry><box><size>0.5 0.5 0.5</size></box></geometry>
</collision>
<visual name="visual">
<geometry><box><size>0.5 0.5 0.5</size></box></geometry>
</visual>
</link>
</model>

Procedurally Generated Worlds

For large-scale testing, generate worlds programmatically:

import random

def generate_random_obstacles(num_obstacles, area_size):
"""Generate SDF with random obstacles"""
sdf_header = '<?xml version="1.0"?>\n<sdf version="1.6">\n <world name="random_world">\n'
sdf_footer = ' </world>\n</sdf>'

obstacles = ""
for i in range(num_obstacles):
x = random.uniform(-area_size/2, area_size/2)
y = random.uniform(-area_size/2, area_size/2)
z = 0.25
width = random.uniform(0.3, 1.0)
height = random.uniform(0.3, 1.0)

obstacles += f"""
<model name="obstacle_{i}">
<static>true</static>
<pose>{x} {y} {z} 0 0 0</pose>
<link name="link">
<collision name="collision">
<geometry><box><size>{width} {width} {height}</size></box></geometry>
</collision>
<visual name="visual">
<geometry><box><size>{width} {width} {height}</size></box></geometry>
</visual>
</link>
</model>
"""

return sdf_header + obstacles + sdf_footer

# Generate 50 obstacles in 20x20m area
world_sdf = generate_random_obstacles(50, 20)
with open('worlds/random_test.world', 'w') as f:
f.write(world_sdf)

Data Logging and Analysis

Gazebo Built-in Logging

Record complete simulation state:

# Start recording
gazebo worlds/test_world.world --record

# Simulation runs, data saved to ~/.gazebo/log/<date-time>/

# Replay (opens Gazebo GUI)
gazebo --playback ~/.gazebo/log/2024-01-15T10-30-45.123456/gzserver/state.log

Logged data: All model poses, velocities, sensor outputs, physics state

Use case: Reproduce bugs, share test runs, analyze failure modes

ROS Bag Recording

Record ROS 2 topics for offline analysis:

# Record all topics
ros2 bag record -a

# Record specific topics
ros2 bag record /scan /odom /camera/image_raw /imu/data

# Record to specific file
ros2 bag record -o navigation_test_01 /scan /odom /cmd_vel

Playback:

ros2 bag play navigation_test_01

# In another terminal, echo topic to verify
ros2 topic echo /scan

Analysis with Python:

from rosbag2_py import SequentialReader, StorageOptions, ConverterOptions
from rclpy.serialization import deserialize_message
from sensor_msgs.msg import LaserScan

# Open bag
storage_options = StorageOptions(uri='navigation_test_01', storage_id='sqlite3')
converter_options = ConverterOptions(input_serialization_format='cdr', output_serialization_format='cdr')

reader = SequentialReader()
reader.open(storage_options, converter_options)

# Read messages
while reader.has_next():
topic, data, timestamp = reader.read_next()
if topic == '/scan':
msg = deserialize_message(data, LaserScan)
print(f"Time: {timestamp}, Min range: {min(msg.ranges):.2f}m")

Custom Metrics Logging

Log application-specific metrics:

import csv
from datetime import datetime

class PerformanceLogger:
def __init__(self, filename):
self.filename = filename
self.file = open(filename, 'w', newline='')
self.writer = csv.writer(self.file)
self.writer.writerow(['timestamp', 'distance_to_goal', 'speed', 'cpu_usage'])

def log(self, distance, speed, cpu):
timestamp = datetime.now().isoformat()
self.writer.writerow([timestamp, distance, speed, cpu])
self.file.flush()

def close(self):
self.file.close()

# Usage in ROS node
logger = PerformanceLogger('test_results.csv')

def control_loop():
# ... compute metrics ...
logger.log(distance_to_goal, current_speed, psutil.cpu_percent())

Measuring Performance

Define quantitative success metrics:

MetricFormulaTarget
Success ratesuccessful_runs / total_runs>95%
Time to goalAverage seconds to reach goal<60s
Path efficiencypath_length / optimal_length<1.2 (within 20% of optimal)
Obstacle clearanceMin distance to obstacles>0.3m
Energy consumptionIntegral of motor power<500 Wh

Example calculation:

import numpy as np

def calculate_path_efficiency(recorded_poses, goal_pose):
"""Calculate path efficiency ratio"""
# Path length from recorded poses
path_length = 0.0
for i in range(len(recorded_poses) - 1):
dx = recorded_poses[i+1][0] - recorded_poses[i][0]
dy = recorded_poses[i+1][1] - recorded_poses[i][1]
path_length += np.sqrt(dx**2 + dy**2)

# Optimal (straight-line) distance
start = recorded_poses[0]
dx = goal_pose[0] - start[0]
dy = goal_pose[1] - start[1]
optimal_length = np.sqrt(dx**2 + dy**2)

efficiency = path_length / optimal_length
return efficiency

# Example: Load from CSV
poses = np.loadtxt('robot_path.csv', delimiter=',') # [[x1, y1], [x2, y2], ...]
goal = [5.0, 5.0]
eff = calculate_path_efficiency(poses, goal)
print(f"Path efficiency: {eff:.2f} (1.0 = optimal, lower is better)")

Manipulation Performance

MetricDefinitionTarget
Grasp success ratesuccessful_grasps / total_attempts>90%
Grasp timeTime from detection to stable grasp<10s
Place accuracyDistance from target placement pose<2cm
Drop rateDropped objects during transport<5%

Sensor Performance

MetricDefinitionTarget
Detection rateObjects detected / objects present>95%
False positive rateFalse detections / total detections<5%
Localization errorDistance between estimated and true pose<5cm
LatencyTime from sensor capture to processed output<100ms

Understanding the Sim-to-Real Gap

Sim-to-Real Gap: Differences between simulated and real-world robot behavior.

Sources of Sim-to-Real Gap

CategorySimulationRealityImpact
SensorsPerfect geometry, tunable noiseUnique noise patterns, calibration errorsHigh
ActuatorsInstant torque responseBacklash, compliance, delaysMedium-High
PhysicsIdealized friction, perfect solversVariable surfaces, complex contactsHigh
PerceptionClean textures, controlled lightingShadows, glare, texture variationHigh
TimingDeterministic, repeatableJitter, latency, non-deterministicMedium

Minimizing the Gap

1. Add Realistic Sensor Noise

Don't use perfect sensors. Add noise matching real hardware:

<sensor name="camera" type="camera">
<camera>
<noise>
<type>gaussian</type>
<mean>0.0</mean>
<stddev>0.007</stddev> <!-- Match real camera specification -->
</noise>
</camera>
</sensor>

Calibrate from real data:

  1. Capture 100 images of static scene with real camera
  2. Compute pixel intensity standard deviation
  3. Use this value for <stddev> in simulation

2. Model Actuator Dynamics

Add realistic delays and saturation:

class RealisticMotorController:
def __init__(self, max_torque, response_time):
self.max_torque = max_torque
self.response_time = response_time
self.current_torque = 0.0

def command(self, desired_torque):
# Saturate to max torque
desired_torque = np.clip(desired_torque, -self.max_torque, self.max_torque)

# First-order lag (motor response time)
alpha = 1.0 / self.response_time
self.current_torque += alpha * (desired_torque - self.current_torque) * dt

return self.current_torque

3. Vary Physics Parameters

Test with parameter ranges, not single values:

# Test with multiple friction coefficients
for friction in [0.6, 0.8, 1.0, 1.2]:
# Update world with new friction
# Run test
# Log results

This reveals sensitivity to parameter uncertainty.

4. Validate Against Real Data

Gold standard: Compare simulation to real robot:

  1. Run identical task on real robot (e.g., drive 5m straight)
  2. Record sensor data (odometry, IMU)
  3. Run same task in simulation
  4. Compare trajectories, velocities, accelerations
  5. Tune simulation parameters to minimize differences

Example comparison:

import matplotlib.pyplot as plt

# Load real robot data
real_time, real_position = load_real_data('real_robot_run.csv')

# Load simulation data
sim_time, sim_position = load_sim_data('sim_robot_run.csv')

# Plot comparison
plt.figure()
plt.plot(real_time, real_position, label='Real Robot', linewidth=2)
plt.plot(sim_time, sim_position, label='Simulation', linestyle='--', linewidth=2)
plt.xlabel('Time (s)')
plt.ylabel('Position (m)')
plt.legend()
plt.title('Sim-to-Real Position Comparison')
plt.grid(True)
plt.savefig('sim_vs_real.png')
plt.show()

# Compute error metric
error = np.mean(np.abs(real_position - sim_position))
print(f"Mean absolute error: {error:.3f}m")

Automated Testing Workflows

Continuous Integration (CI) with Simulation

Integrate simulation tests into CI/CD pipeline:

GitHub Actions example (.github/workflows/gazebo_tests.yml):

name: Gazebo Simulation Tests

on: [push, pull_request]

jobs:
test:
runs-on: ubuntu-22.04

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Install ROS 2 Humble
run: |
sudo apt update
sudo apt install -y ros-humble-desktop gazebo

- name: Build workspace
run: |
source /opt/ros/humble/setup.bash
colcon build

- name: Run simulation tests
run: |
source install/setup.bash
pytest tests/test_*.py --junit-xml=test_results.xml

- name: Publish test results
uses: EnricoMi/publish-unit-test-result-action@v1
if: always()
with:
files: test_results.xml

Every code push triggers automated simulation tests.

Batch Testing Script

Run multiple test scenarios overnight:

#!/bin/bash
# batch_test.sh - Run all test scenarios

SCENARIOS=("door_crossing" "obstacle_avoidance" "object_pickup" "failure_recovery")

for scenario in "${SCENARIOS[@]}"; do
echo "Running scenario: $scenario"

# Launch Gazebo with scenario world
ros2 launch chapter3_simulation $scenario.launch.py &
GAZEBO_PID=$!

# Wait for Gazebo to initialize
sleep 10

# Run test
pytest tests/test_$scenario.py --junit-xml=results/test_$scenario.xml

# Kill Gazebo
kill $GAZEBO_PID
killall gzserver gzclient

# Wait before next test
sleep 5
done

echo "All tests complete. Results in results/"

Run overnight:

nohup ./batch_test.sh > batch_test.log 2>&1 &

Example: Complete Validation Workflow

Goal: Validate mobile robot navigation before hardware deployment

Step 1: Define Acceptance Criteria

acceptance_criteria:
navigation:
success_rate: 0.95 # 95% of runs reach goal
max_time: 60 # seconds
obstacle_clearance: 0.30 # meters

safety:
max_velocity: 1.0 # m/s
min_obstacle_distance: 0.25 # m
collision_tolerance: 0 # Zero collisions allowed

performance:
path_efficiency: 1.3 # Within 30% of optimal path
avg_time_to_goal: 45 # seconds

Step 2: Create Test Scenarios

  • Scenario 1: Empty 10x10m room, goal at (5,5)
  • Scenario 2: Room with 5 randomly placed obstacles
  • Scenario 3: Narrow corridor (0.8m wide)
  • Scenario 4: Dynamic obstacle (moving person)

Step 3: Run Tests (100 runs each)

for i in {1..100}; do
ros2 launch chapter3_simulation scenario1_empty_room.launch.py seed:=$i
pytest tests/test_navigation.py --scenario=1 --run=$i
done

Step 4: Analyze Results

import pandas as pd

# Load all test results
results = pd.read_csv('all_results.csv')

# Compute metrics
success_rate = results['success'].mean()
avg_time = results[results['success']]['time_to_goal'].mean()
min_clearance = results['min_obstacle_distance'].min()

print(f"Success Rate: {success_rate*100:.1f}%")
print(f"Avg Time: {avg_time:.1f}s")
print(f"Min Clearance: {min_clearance:.2f}m")

# Check against criteria
assert success_rate >= 0.95, "Success rate below 95%"
assert avg_time <= 45, "Average time exceeds 45s"
assert min_clearance >= 0.25, "Obstacle clearance violated"

Step 5: Sim-to-Real Validation

Run 10 trials on real robot, compare to simulation:

# Statistical comparison
from scipy import stats

sim_times = [42, 38, 45, 40, 43, ...] # From simulation
real_times = [45, 50, 48, 52, 47, ...] # From real robot

# T-test: Are means significantly different?
t_stat, p_value = stats.ttest_ind(sim_times, real_times)

if p_value < 0.05:
print(f"WARNING: Sim and real times differ significantly (p={p_value:.3f})")
else:
print(f"Sim and real times are consistent (p={p_value:.3f})")

Summary

Effective simulation testing requires:

  • Comprehensive test coverage: Unit, integration, scenario, and stress tests
  • Quantitative metrics: Success rate, time, accuracy, safety margins
  • Realistic worlds: From minimal test environments to complex scenarios
  • Data logging: ROS bags, Gazebo logs, custom metrics
  • Sim-to-real awareness: Add noise, vary parameters, validate against real data
  • Automation: CI/CD integration, batch testing, overnight runs

Simulation testing is not a replacement for real-world validation, but a powerful tool to reduce risk and accelerate development.

Review Questions

1. What are the four main types of simulation tests, and when would you use each?

Answer:

  1. Unit Tests: Test individual components in isolation (e.g., LIDAR sensor, motor controller)

    • When: During component development, to verify single features work correctly
    • Example: Verify LIDAR detects wall at known distance
  2. Integration Tests: Test how multiple components work together (e.g., LIDAR + planner + controller)

    • When: After unit tests pass, to verify components interact correctly
    • Example: Test navigation stack reaches goal using LIDAR and odometry
  3. Scenario Tests: Test complete robot behaviors in realistic situations

    • When: Before deployment, to validate end-to-end functionality
    • Example: Robot opens door and passes through, or avoids dynamic obstacles
  4. Stress Tests: Test robot at operational limits and edge cases

    • When: To understand failure modes and safety margins
    • Example: Maximum payload, extreme speeds, degraded sensors
2. Explain three strategies to minimize the sim-to-real gap.

Answer: Three effective strategies:

  1. Add realistic sensor noise: Don't use perfect sensors. Calibrate noise models from real hardware (measure pixel stddev, LIDAR variance, IMU drift) and configure simulation to match. Example:

    <noise><type>gaussian</type><stddev>0.007</stddev></noise>
  2. Model actuator dynamics: Add realistic delays, saturation, and response times. Real motors don't instantly reach commanded torque—model first-order lag or more complex dynamics based on motor datasheets.

  3. Validate against real data: Run identical tasks on real robot and in simulation, then compare trajectories/sensor data. Tune simulation parameters (friction, inertia, time step) to minimize differences. Use statistical tests to verify consistency.

Other valid strategies: vary physics parameters to test robustness, use high-fidelity contact models, incorporate systematic errors (sensor bias).

3. What data should you log during a navigation test, and why?

Answer: Essential data to log:

  1. Robot pose (/odom or /tf): Track actual path taken, compute path efficiency, detect deviations
  2. Goal pose: Record intended destination for comparison
  3. Sensor data (/scan, /camera): Analyze what robot perceived, debug perception failures
  4. Commands (/cmd_vel): Understand controller decisions, identify erratic behavior
  5. Timestamps: Compute time to goal, detect delays, ensure proper sequencing
  6. Performance metrics: Distance to goal over time, obstacle clearance, velocity profiles
  7. Events: Goal reached, collision detected, planner failures (log warnings/errors)

Why: Enables offline analysis, reproducing bugs, computing quantitative metrics, and comparing sim vs. real performance. Without logging, you can't systematically improve or validate algorithms.

4. A test shows 85% success rate (below the 95% target). What are three ways to diagnose the failures?

Answer: Three diagnostic approaches:

  1. Analyze failure cases specifically: Filter logged data to failed runs only. Look for patterns:

    • Do failures occur in specific world locations?
    • At specific times (e.g., during tight turns)?
    • With specific sensor readings (e.g., LIDAR detecting false obstacle)?
    • Visualize failed paths in RViz overlaid on map
  2. Replay failed runs: Use Gazebo playback (gazebo --playback log.file) or ROS bag replay to watch failures in slow motion. Inspect:

    • When did robot deviate from expected behavior?
    • What was the sensor data at failure moment?
    • Did controller issue erratic commands?
  3. Add instrumentation: Insert additional logging/visualization in suspected problem areas:

    • Log internal planner state (candidate paths, cost maps)
    • Visualize intermediate computation (detected obstacles, planned trajectory)
    • Add breakpoints or step-through debugging in critical code sections

Compare successful vs. failed runs to identify differentiating factors (e.g., "failures always occur when obstacle clearance < 0.4m").

5. Why is automated testing in CI/CD important for robotics, and what challenges does it face?

Answer:

Importance:

  1. Prevent regressions: Every code change is automatically tested; bugs caught before merging
  2. Faster development: Developers get immediate feedback (minutes, not hours)
  3. Confidence: Large test suite running on every commit ensures code quality
  4. Documentation: Tests serve as executable specifications of expected behavior
  5. Collaboration: Team members can safely modify code knowing tests will catch breakage

Challenges:

  1. Computational cost: Running Gazebo is expensive; CI servers may lack GPUs or sufficient CPU

    • Solution: Headless mode (gzserver only), limit test duration, use cloud compute
  2. Non-determinism: Physics simulation can have subtle variations between runs

    • Solution: Use fixed random seeds, set real_time_factor=0 (run as fast as possible), increase tolerance in assertions
  3. Long test times: Complex scenarios can take minutes-hours

    • Solution: Parallelize tests, run subset on every commit and full suite nightly
  4. Setup complexity: Requires ROS 2, Gazebo, custom packages installed on CI server

    • Solution: Use Docker containers with pre-installed dependencies

Despite challenges, automated simulation testing is essential for production robotics development.