Architecture & Concepts

System overview

NetRL intercepts the observation path between a Gymnasium environment and an RL agent. Instead of the agent receiving the state directly, the observation is transmitted through a configurable channel model that introduces loss, delay, and (for ns-3 backends) realistic wireless contention:

┌─────────────────────────────────────────────────────┐
│  Gymnasium step loop                                 │
│                                                      │
│  env.step(action)                                    │
│    │                                                 │
│    ▼                                                 │
│  raw_obs ──► CommChannel.transmit()                 │
│                     │  (loss / delay)               │
│                     ▼                               │
│  CommChannel.flush() ──► ObservationBuffer.add()   │
│                               │                     │
│                               ▼                     │
│  agent ◄── Dict{"observations", "recv_mask"}        │
└─────────────────────────────────────────────────────┘

For MultiViewNetworkedEnv, N independent paths run in parallel — one per observer — managed by a single CentralNode:

raw_obs ──► MultiViewModel.observe() ──► {obs_0, obs_1, ..., obs_N}
                                             │
            ┌──────────────────────────────┬─┴──────────────────────┐
            │ Observer 0                   │ Observer N             │
            │ CommChannel.transmit(obs_0)  │ CommChannel.transmit() │
            │       ↓                      │       ↓                │
            │ CommChannel.flush()          │ CommChannel.flush()    │
            │       ↓                      │       ↓                │
            │ ObservationBuffer.add()      │ ObservationBuffer.add()│
            └──────────────────────────────┴────────────────────────┘
                            │
                            ▼
            Dict{"obs_0": {...}, "obs_1": {...}, ...}

Class hierarchy

CommChannel  (ABC)
├── GEChannel           — Markov chain; C++ core (netcomm extension)
├── PerfectChannel      — lossless; zero-delay
├── NS3WiFiChannelFast  — 802.11a ad-hoc; pybind11 in-process (netrl_ext)
├── NS3WifiChannel      — 802.11a ad-hoc; ns-3 subprocess
├── NS3MmWaveChannel    — 5G mmWave EPC; ns-3 subprocess
├── NS3LenaChannel      — 5G NR; ns-3 subprocess
└── NS3WifiUEChannel    — per-UE proxy; shared NS3WifiMultiUEBackend

ObservationBuffer       — fixed-size circular buffer + recv_mask

CentralNode             — Dict[node_id → CommChannel + ObservationBuffer]

gym.Wrapper
├── NetworkedEnv        — single observer; owns one CentralNode
└── MultiViewNetworkedEnv — N observers; owns one CentralNode

Timing model

Time is discretised into integer env steps. Step t occupies ns-3 simulation time [t · step_ms, (t+1) · step_ms).

transmit(obs, step=t)

The packet carrying observation obs is scheduled to be sent at t · step_ms + ε (a tiny offset into the step window).

flush(step=t)

The ns-3 simulator is advanced to (t+1) · step_ms. Any packets whose receive callback fired during [t · step_ms, (t+1) · step_ms) are returned as the result.

For the Gilbert–Elliott backend there is no real-time simulation: transmit rolls a Markov state transition and samples a Bernoulli loss; flush pops all packets whose arrival_step step from an in-memory deque.

Persisted simulation state

The ns-3 subprocess backends run continuously across steps. Simulator::Run() is called once per FLUSH with an increasing stop-time. The pybind11 fast backend (NS3WiFiChannelFast) operates in the same way — the same NS3 simulator object lives inside the Python process and is advanced in-place each step. In both cases:

  • MAC backoff counters, retry queues, and association state persist between steps.

  • A RESET (triggered by env.reset()) calls Simulator::Destroy() and rebuilds the topology from scratch.

Warm-up period

Infrastructure-mode backends (Multi-UE WiFi, mmWave, 5G-LENA) require a warm-up phase before the first READY:

Backend

Warm-up

READY timeout

802.11a (ad-hoc, single STA)

310 ms (3 beacon intervals)

30 s

802.11a (infrastructure, N STAs)

500 ms (association)

60 s

5G mmWave / 5G-LENA

500 ms (UE attach + bearer)

60 s

Gilbert–Elliott channel model

The GE channel is a two-state hidden Markov model:

    p_gb               p_bg
┌─────────────────────────┐
│                         │
▼                         │
GOOD  ──────────────────► BAD
loss_good                 loss_bad

At each transmit() call:

  1. The Markov state is updated: transition with probability p_gb (Good→Bad) or p_bg (Bad→Good).

  2. The packet is dropped with loss_good (Good state) or loss_bad (Bad state).

  3. If not dropped, the packet is queued with arrival_step = step + delay_steps.

The C++ implementation uses a Mersenne Twister (std::mt19937_64) seeded at construction.

ns-3 subprocess protocol

All ns-3 backends use an identical line-oriented stdin/stdout protocol:

Python → subprocess

TRANSMIT <step_id> <pkt_size>          # single-UE backends
TRANSMIT <ue_id> <step_id> <pkt_size>  # multi-UE backend
FLUSH    <step_id>
RESET
QUIT

Subprocess → Python

READY                          # once, at startup
OK                             # ACK for TRANSMIT / RESET
RECV <id1> <id2> ...           # single-UE: space-separated step_ids
RECV <ue_id>:<step_id> ...     # multi-UE:  ue_id:step_id pairs
ERROR <message>

The Python side stores the observation in a _pending dict keyed by step_id (or (ue_id, step_id)). On a successful FLUSH, the received ids are looked up to retrieve the original NumPy arrays.

Observation buffer semantics

Each ObservationBuffer is a fixed-size circular window. add(obs_or_None) advances by one slot every step, whether or not a packet arrived:

step 0: transmit → arrives at step 2 (delay_steps=2)
step 1: transmit → arrives at step 3
step 2: flush → obs from step 0 arrives → buffer[-1] = obs_0
step 3: flush → obs from step 1 arrives → buffer[-1] = obs_1
            buffer[-2] = obs_0

get_padded() always returns (obs_array, recv_mask) of shape (maxlen, *obs_shape) and (maxlen,). Unwritten or lost-packet slots contain zero arrays with recv_mask == False.

Strategy pattern

CentralNode uses the Strategy pattern for channel selection. The channel_factory parameter is a Callable[[NetworkConfig], CommChannel] called once per node. To add a new backend:

  1. Subclass CommChannel and implement the four methods.

  2. Create a config dataclass with a validate() method.

  3. Pass channel_factory=YourChannel to CentralNode or either environment wrapper.