torchrl.envs

The environment is the world that the agent interacts with, it could be a game, a physics engine or anything you would like. It should receive and execute an action and return to the agent the next observation and a reward.

BaseEnv

class torchrl.envs.BaseEnv(env_name)[source]

Bases: abc.ABC

Abstract base class used for implementing new environments.

Includes some basic functionalities, like the option to use a running mean and standard deviation for normalizing states.

Parameters:
  • env_name (str) – The environment name.
  • fixed_normalize_states (bool) – If True, use the state min and max value to normalize the states (Default is False).
  • running_normalize_states (bool) – If True, use the running mean and std to normalize the states (Default is False).
  • scale_reward (bool) – If True, use the running std to scale the rewards (Default is False).
get_state_info()[source]

Returns a dict containing information about the state space.

The dict should contain two keys: shape indicating the state shape, and dtype indicating the state type.

Example

State space containing 4 continuous actions:

return dict(shape=(4,), dtype='continuous')
get_action_info()[source]

Returns a dict containing information about the action space.

The dict should contain two keys: shape indicating the action shape, and dtype indicating the action type.

If dtype is int it will be assumed a discrete action space.

Example

Action space containing 4 float numbers:

return dict(shape=(4,), dtype='float')
simulator

Returns the name of the simulator being used as a string.

_create_env()[source]

Creates ans returns an environment.

Returns:
Return type:Environment object.
reset()[source]

Resets the environment to an initial state.

Returns:A numpy array with the state information.
Return type:numpy.ndarray
step(action)[source]

Receives an action and execute it on the environment.

Parameters:action (int or float or numpy.ndarray) – The action to be executed in the environment, it should be an int for discrete enviroments and float for continuous. There’s also the possibility of executing multiple actions (if the environment supports so), in this case it should be a numpy.ndarray.
Returns:
  • next_state (numpy.ndarray) – A numpy array with the state information.
  • reward (float) – The reward.
  • done (bool) – Flag indicating the termination of the episode.
  • info (dict) – Dict containing additional information about the state.
update_config(config)[source]

Updates a Config object to include information about the environment.

Parameters:config (Config) – Object used for storing configuration.

GymEnv

class torchrl.envs.GymEnv(env_name, **kwargs)[source]

Bases: torchrl.envs.base_env.BaseEnv

Creates and wraps a gym environment.

Parameters:
  • env_name (str) – The Gym ID of the env. For a list of available envs check this page.
  • wrappers (list) – List of wrappers to be applied on the env. Each wrapper should be a function that receives and returns the env.
simulator

Returns the name of the simulator being used as a string.

reset()[source]

Calls the reset method on the gym environment.

Returns:state – A numpy array with the state information.
Return type:numpy.ndarray
step(action)[source]

Calls the step method on the gym environment.

Parameters:action (int or float or numpy.ndarray) – The action to be executed in the environment, it should be an int for discrete enviroments and float for continuous. There’s also the possibility of executing multiple actions (if the environment supports so), in this case it should be a numpy.ndarray.
Returns:
  • next_state (numpy.ndarray) – A numpy array with the state information.
  • reward (float) – The reward.
  • done (bool) – Flag indicating the termination of the episode.
get_state_info()[source]

Dictionary containing the shape and type of the state space. If it is continuous, also contains the minimum and maximum value.

get_action_info()[source]

Dictionary containing the shape and type of the action space. If it is continuous, also contains the minimum and maximum value.

update_config(config)[source]

Updates a Config object to include information about the environment.

Parameters:config (Config) – Object used for storing configuration.
static get_space_info(space)[source]

Gets the shape of the possible types of states in gym.

Parameters:space (gym.spaces) – Space object that describes the valid actions and observations
Returns:Dictionary containing the space shape and type
Return type:dict

RoboschoolEnv

class torchrl.envs.RoboschoolEnv(*args, **kwargs)[source]

Bases: torchrl.envs.gym_env.GymEnv

Support for gym Roboschool.

get_action_info()

Dictionary containing the shape and type of the action space. If it is continuous, also contains the minimum and maximum value.

static get_space_info(space)

Gets the shape of the possible types of states in gym.

Parameters:space (gym.spaces) – Space object that describes the valid actions and observations
Returns:Dictionary containing the space shape and type
Return type:dict
get_state_info()

Dictionary containing the shape and type of the state space. If it is continuous, also contains the minimum and maximum value.

reset()

Calls the reset method on the gym environment.

Returns:state – A numpy array with the state information.
Return type:numpy.ndarray
simulator

Returns the name of the simulator being used as a string.

step(action)

Calls the step method on the gym environment.

Parameters:action (int or float or numpy.ndarray) – The action to be executed in the environment, it should be an int for discrete enviroments and float for continuous. There’s also the possibility of executing multiple actions (if the environment supports so), in this case it should be a numpy.ndarray.
Returns:
  • next_state (numpy.ndarray) – A numpy array with the state information.
  • reward (float) – The reward.
  • done (bool) – Flag indicating the termination of the episode.
update_config(config)

Updates a Config object to include information about the environment.

Parameters:config (Config) – Object used for storing configuration.