ROS 2 Architecture: Understanding Distributed Robotics Systems

Introduction

Welcome to the world of distributed robotics! In this section, you'll learn the fundamental architecture that powers ROS 2 (Robot Operating System 2), the leading middleware for building modern robotic systems. Unlike traditional monolithic programs where all functionality exists in a single process, ROS 2 embraces a distributed architecture where independent components communicate over a network.

This design philosophy isn't arbitrary—it mirrors the inherent complexity of robotics. A humanoid robot doesn't have a single "brain" that controls everything. Instead, it has sensors (cameras, IMUs, force sensors), actuators (motors), decision-making systems (AI agents), and safety monitors—all working simultaneously. ROS 2 provides the infrastructure to coordinate these diverse components.

Why Distributed Architecture for Robotics?

The Monolithic Problem

Imagine writing a single Python script that:

Reads data from 20 cameras at 30 FPS
Processes sensor fusion from IMUs and force sensors
Runs AI inference for object detection
Controls 30+ motors with real-time constraints
Monitors battery levels and safety limits
Logs data to disk

This monolithic approach creates several problems:

Complexity Explosion: A single file with thousands of lines becomes unmaintainable
No Fault Isolation: If one component crashes, the entire system dies
Difficult Testing: Can't test individual components in isolation
Poor Resource Utilization: Can't distribute computation across multiple cores/machines
Rigid Deployment: Can't easily move components to different hardware

The ROS 2 Solution: Nodes as Building Blocks

ROS 2 solves this by breaking systems into nodes—independent processes that each handle a specific responsibility. For example:

Camera Node: Captures and publishes image data
Object Detector Node: Subscribes to images, runs AI inference, publishes detections
Motion Planner Node: Receives detections and current robot state, outputs motion commands
Motor Controller Node: Subscribes to motion commands, controls hardware

Each node is:

Independent: Runs in its own process with isolated memory
Single-Purpose: Does one thing well (Unix philosophy)
Replaceable: Can swap implementations without changing other nodes
Testable: Can be tested in isolation with mock data

Core Communication Paradigms

ROS 2 provides three primary communication patterns, each suited for different use cases:

Use Case: Continuous streaming data where multiple consumers might be interested.

How It Works:

Publishers send messages to named topics (e.g., /camera/image)
Subscribers listen to topics and receive messages via callbacks
Asynchronous: Publishers don't wait for subscribers
Many-to-Many: Multiple publishers and subscribers can share a topic

Real-World Examples:

Camera streaming images (1 publisher → many subscribers)
Sensor data (joint positions, IMU readings, battery status)
Robot telemetry (position, velocity, diagnostics)

Key Characteristics:

Fire-and-Forget: Publishers don't know if anyone is listening
Latest-Value Semantics: Subscribers typically process the most recent message
High Frequency: Ideal for data published at 10Hz, 100Hz, or faster

2. Services (Request-Response Pattern)

Use Case: Blocking request-response interactions where you need an answer.

How It Works:

Service Server: Waits for requests, processes them, returns responses
Service Client: Sends request, blocks until response is received
Synchronous: Client waits for server
One-to-One: Each request gets exactly one response

Real-World Examples:

Triggering a calibration routine (request: "calibrate", response: "success/failure")
Querying robot state (request: "get current position", response: position data)
Resetting a component (request: "reset", response: "done")

Key Characteristics:

Blocking: Client waits for server to respond
Transactional: Guaranteed request-response pairing
Lower Frequency: Ideal for occasional interactions

3. Actions (Long-Running Tasks)

Use Case: Tasks that take significant time and need progress feedback.

How It Works:

Action Server: Accepts goals, executes them asynchronously, sends feedback and results
Action Client: Sends goals, receives periodic feedback, can cancel goals
Asynchronous with Feedback: Client doesn't block but gets progress updates
Cancelable: Client can abort in-progress actions

Real-World Examples:

Navigation: "Go to position X" (feedback: current distance, result: "arrived" or "failed")
Grasping: "Pick up object" (feedback: gripper position, result: "grasped" or "failed")
Motion execution: "Execute trajectory" (feedback: % complete, result: success/failure)

Key Characteristics:

Goal-Oriented: Represents a task to accomplish
Preemptable: Can be canceled mid-execution
Stateful: Provides feedback during execution

Comparison Table: Topics vs Services vs Actions

Feature	Topics	Services	Actions
Communication	Pub-Sub	Request-Response	Goal-Feedback-Result
Blocking	No	Yes	No (async)
Frequency	High (continuous)	Low (on-demand)	Low (tasks)
Feedback	No	No	Yes (progress)
Cancelable	N/A	N/A	Yes
Use Case	Streaming data	Quick queries	Long-running tasks
Example	Sensor data	Get battery %	Navigate to waypoint

The Data Distribution Service (DDS) Layer

Unlike ROS 1, which used a custom TCP-based protocol, ROS 2 is built on top of DDS (Data Distribution Service)—an industry-standard middleware used in aerospace, defense, and industrial automation.

What is DDS?

DDS is a peer-to-peer communication standard that provides:

Discovery: Nodes automatically find each other on the network (no central master)
Quality of Service (QoS): Fine-grained control over reliability, latency, durability
Real-Time Capable: Designed for systems with strict timing requirements
Security: Built-in authentication, encryption, access control

Benefits for Robotics

No Single Point of Failure: Unlike ROS 1's master node, ROS 2 has no central coordinator
Multi-Robot Systems: Easier to build systems with multiple robots communicating
Network Flexibility: Works across Ethernet, Wi-Fi, shared memory, etc.
Quality of Service: Can prioritize critical data (e.g., safety messages) over telemetry

DDS in Action (Simplified)

When you publish a message in ROS 2:

Your Code → rclpy → DDS Implementation → Network → DDS Implementation → rclpy → Subscriber Code

The DDS layer handles:

Serialization (converting Python objects to bytes)
Network transport (UDP, TCP, or shared memory)
Discovery (finding subscribers)
Reliability (resending lost packets if QoS requires it)

ROS 1 vs ROS 2: Key Improvements

If you've encountered ROS 1 (used in many older tutorials), here's what changed in ROS 2:

Aspect	ROS 1	ROS 2
Architecture	Master-based (single point of failure)	Peer-to-peer (distributed discovery)
Middleware	Custom TCPROS protocol	DDS (industry standard)
Real-Time	Limited support	Real-time capable with DDS QoS
Security	None (plaintext, no auth)	DDS security (encryption, access control)
Platforms	Linux only (practically)	Linux, Windows, macOS
Python	Python 2.7 (obsolete)	Python 3.6+
Build System	catkin	ament/colcon
Lifecycle	Simple start/stop	Managed lifecycle nodes

Key Takeaway: ROS 2 was redesigned from the ground up to be production-ready, not just for research.

Node Lifecycle (Managed Nodes)

ROS 2 introduces lifecycle management for nodes, allowing fine-grained control over startup and shutdown. While not required for basic nodes, managed lifecycle nodes have explicit states:

Unconfigured: Node exists but isn't ready
Inactive: Node is configured but not processing data
Active: Node is fully operational
Finalized: Node is shutting down

Benefits:

Controlled initialization (e.g., "configure" then "activate")
Graceful shutdown (clean up resources)
Runtime reconfiguration (deactivate → reconfigure → activate)

When to Use:

Production systems where startup order matters
Nodes that manage hardware (need clean shutdown)
Systems requiring dynamic reconfiguration

For Learning: We'll start with simple nodes and introduce lifecycle management later.

Quality of Service (QoS) Profiles

QoS allows you to tune communication behavior for different needs:

Common QoS Settings:

Reliability: Best-effort (fast, may drop messages) vs Reliable (guarantees delivery)
Durability: Volatile (only current subscribers get messages) vs Transient-Local (new subscribers get last message)
History: Keep-Last-N (buffer N messages) vs Keep-All (unlimited buffer)

Example Use Cases:

Sensor Data: Best-effort, keep-last-10 (OK to drop old data, prioritize freshness)
Commands: Reliable, keep-last-1 (critical commands must arrive)
Initialization Data: Reliable, transient-local (new nodes get last configuration)

Default: Most beginners use default QoS (reliable, volatile, keep-last-10) which works well for learning.

Putting It All Together: A Simple System

Imagine a simple robot system:

[Camera Node] --/image--> [Object Detector] --/detections--> [Motion Planner] --/cmd_vel--> [Motor Controller]
                                                                     ^
                                                                     |
                                                              [Service: /reset_planner]

Communication Breakdown:

Camera → Detector: Topic (continuous image stream)
Detector → Planner: Topic (detections as they occur)
Planner → Motors: Topic (continuous velocity commands)
Reset Service: Service (one-time reset request)

Why This Design:

Each node is independently testable (can feed mock images to detector)
Fault isolation (if detector crashes, camera and motors keep running)
Replaceable components (can swap detector algorithm without touching planner)
Parallel execution (all nodes run simultaneously on different CPU cores)

Summary

ROS 2 provides a distributed architecture for building complex robotic systems by:

Decomposing systems into nodes: Independent, single-purpose processes
Three communication patterns: Topics (streaming), Services (request-response), Actions (tasks)
Built on DDS: Industry-standard middleware with discovery, QoS, and security
Improved from ROS 1: No master node, real-time capable, multi-platform, production-ready
Lifecycle management: Fine-grained control over node states (optional)
Quality of Service: Tunable reliability, durability, and history settings

In the next section, we'll move from concepts to practice: creating your first ROS 2 package.

Review Questions

What is the main advantage of ROS 2's distributed architecture over a monolithic program?

Details
Answer
Fault isolation (components can fail independently), easier testing (test nodes in isolation), better resource utilization (parallel execution), and flexibility in deployment.
When would you use a Service instead of a Topic?

Details
Answer
When you need a synchronous request-response interaction (e.g., querying data, triggering a one-time action) rather than continuous streaming. Services guarantee a response, while topics are fire-and-forget.
What is the role of DDS in ROS 2?

Details
Answer
DDS is the middleware layer that handles peer-to-peer discovery, network communication, serialization, and Quality of Service. It eliminates the need for ROS 1's central master node.
Give an example of when you would use an Action instead of a Service.

Details
Answer
For long-running tasks that need progress feedback and/or cancellation capability. Example: navigating a robot to a goal (you want periodic feedback on distance remaining and the ability to cancel mid-navigation).
What are the three key communication patterns in ROS 2?
Answer
1. Topics (publish-subscribe for streaming data), 2) Services (request-response for synchronous queries), 3) Actions (goal-feedback-result for long-running tasks).

Next: Package Development - Learn how to create your first ROS 2 package

Introduction​

Why Distributed Architecture for Robotics?​

The Monolithic Problem​

The ROS 2 Solution: Nodes as Building Blocks​

Core Communication Paradigms​

1. Topics (Publish-Subscribe Pattern)​

2. Services (Request-Response Pattern)​

3. Actions (Long-Running Tasks)​

Comparison Table: Topics vs Services vs Actions​

The Data Distribution Service (DDS) Layer​

What is DDS?​

Benefits for Robotics​

DDS in Action (Simplified)​

ROS 1 vs ROS 2: Key Improvements​

Node Lifecycle (Managed Nodes)​

Quality of Service (QoS) Profiles​

Putting It All Together: A Simple System​

Summary​

Review Questions​

Introduction

Why Distributed Architecture for Robotics?

The Monolithic Problem

The ROS 2 Solution: Nodes as Building Blocks

Core Communication Paradigms

1. Topics (Publish-Subscribe Pattern)

2. Services (Request-Response Pattern)

3. Actions (Long-Running Tasks)

Comparison Table: Topics vs Services vs Actions

The Data Distribution Service (DDS) Layer

What is DDS?

Benefits for Robotics

DDS in Action (Simplified)

ROS 1 vs ROS 2: Key Improvements

Node Lifecycle (Managed Nodes)

Quality of Service (QoS) Profiles

Putting It All Together: A Simple System

Summary

Review Questions