Single-Observer Environments ============================= :class:`~netrl.NetworkedEnv` wraps any Gymnasium environment whose observation space is a ``Box``. A single observation path — one channel, one buffer — is simulated between the environment and the agent. .. contents:: On this page :local: :depth: 2 Observation space ----------------- The original ``Box(obs_shape)`` is replaced by a ``Dict``: .. code-block:: text gymnasium.spaces.Dict({ "observations": Box(shape=(buffer_size, *obs_shape), dtype=obs_dtype), "recv_mask": MultiBinary(buffer_size), }) The agent sees the last ``buffer_size`` delivery slots. Slot ``[-1]`` is the most recent; slot ``[0]`` is the oldest. Slots where no packet arrived (loss or delay) are zero-padded; ``recv_mask[i] == True`` means a real observation occupies slot ``i``. Per-step sequence ----------------- For each call to ``env.step(action)``: 1. The wrapped environment is stepped: ``raw_obs, reward, term, trunc, info``. 2. ``raw_obs`` is serialised into a UDP-payload packet and handed to the channel. 3. The channel is flushed: packets due at this step are delivered (or dropped). 4. The observation buffer is updated with the arrived packet (or ``None``). 5. The Dict observation (buffer + mask) is returned. .. note:: The channel simulates the *forward* path only (sensor → central node). The return path (controller → actuator) is instantaneous. Construction ------------ .. code-block:: python from netrl import NetworkedEnv, NetworkConfig env = NetworkedEnv( base_env, config=NetworkConfig( p_gb=0.10, # Good → Bad p_bg=0.30, # Bad → Good loss_good=0.01, loss_bad=0.20, delay_steps=2, buffer_size=10, seed=42, ), channel_config=None, # None → Gilbert–Elliott (default) ) Selecting a different channel backend is done via ``channel_config``: .. code-block:: python from netrl import NS3WifiConfig env = NetworkedEnv(base_env, config, channel_config=NS3WifiConfig(distance_m=30.0)) See :doc:`channels` for details on all available backends. Using ``step()`` ---------------- .. code-block:: python obs, reward, term, trunc, info = env.step(action) # Optional: override the packet payload size for this step only obs, reward, term, trunc, info = env.step(action, packet_size=256) The ``info`` dictionary is augmented with: .. list-table:: :header-rows: 1 :widths: 30 70 * - Key - Value * - ``"channel_info"`` - ``dict`` from :meth:`CommChannel.get_channel_info` — includes ``"state"`` (``"GOOD"`` / ``"BAD"`` for GE), ``"pending_count"``, etc. * - ``"arrived_this_step"`` - ``bool`` — ``True`` if a packet arrived at the central node this step. Resetting --------- .. code-block:: python obs, info = env.reset() This resets the wrapped environment **and** calls ``central_node.reset()``, which clears the channel queues, resets the GE Markov state, and zeroes the observation buffer. For ns-3 backends the subprocess is fully reinitialised. Training with Stable-Baselines3 --------------------------------- :class:`~netrl.NetworkedEnv` is a standard ``gymnasium.Wrapper`` and works with any SB3 policy that accepts ``MultiInputPolicy``: .. code-block:: python from stable_baselines3 import PPO from netrl import NetworkedEnv, NetworkConfig env = NetworkedEnv(gym.make("CartPole-v1"), NetworkConfig(buffer_size=10)) model = PPO("MultiInputPolicy", env, verbose=1) model.learn(total_timesteps=100_000) Parallel environments ---------------------- Use ``gymnasium.vector.AsyncVectorEnv`` to run multiple independent environments (each with its own channel subprocess) in parallel: .. code-block:: python from gymnasium.vector import AsyncVectorEnv from netrl import NetworkedEnv, NetworkConfig, NS3WifiConfig def make_env(seed): def _fn(): return NetworkedEnv( gym.make("CartPole-v1"), NetworkConfig(buffer_size=10, seed=seed), channel_config=NS3WifiConfig(distance_m=30.0, step_duration_ms=2.0), ) return _fn vec_env = AsyncVectorEnv([make_env(i) for i in range(4)]) obs, info = vec_env.reset() # obs["observations"].shape == (4, 10, 4)