Agents¶

The agent is the bridge between the model and the environment.
It implements high level functions ready to be used by the user.

BaseAgent¶

class torchrl.agents.BaseAgent(batcher, optimizer, *, gamma=0.99, log_dir='runs')[source]¶

Bases: abc.ABC

Basic TorchRL agent. Encapsulate an environment and a model.

Parameters:	env (torchrl.envs) – A torchrl environment. gamma (float) – Discount factor on future rewards (Default is 0.99). log_dir (string) – Directory where logs will be written (Default is runs).

step()[source]¶: This method is called at each interaction of the training loop, and defines the training procedure.

_check_termination()[source]¶

Check if the training loop reached the end.

Returns:	bool True if done, False otherwise.

_register_model(name, model)[source]¶

Save a torchrl model to the internal memory.

Parameters:	name (str) – Desired name for the model. model (torchrl.models) – The model to register.

train(*, max_iters=-1, max_episodes=-1, max_steps=-1, log_freq=1, eval_env=None, eval_freq=None)[source]¶

Defines the training loop of the algorithm, calling step() at every iteration.

Parameters:	max_updates (int) – Maximum number of gradient updates (Default is -1, meaning it doesn’t matter). max_episodes (int) – Maximum number of episodes (Default is -1, meaning it doesn’t matter). max_steps (int) – Maximum number of steps (Default is -1, meaning it doesn’t matter).

select_action(state, step)[source]¶

Receive a state and use the model to select an action.

Parameters:	state (numpy.ndarray) – The environment state.
Returns:	action – The selected action.
Return type:	int or numpy.ndarray

write_logs()[source]¶: Use the logger to write general information about the training process.

PGAgent¶

class torchrl.agents.PGAgent(batcher, *, policy_model, value_model=None, normalize_advantages=True, advantage=<torchrl.utils.estimators.advantage.estimators.GAE object>, vtarget=<torchrl.utils.estimators.value.estimators.FromAdvantage object>, **kwargs)[source]¶

Bases: torchrl.agents.base_agent.BaseAgent

Policy Gradient Agent, compatible with all PG models.

This agent encapsulates a policy_model and optionally a value_model, it defines the steps needed for the training loop (see step()), and calculates all the necessary values to train the model(s).

Parameters:

env (torchrl.envs) – A torchrl environment.
policy_model (torchrl.models) – Should be a subclass of torchrl.models.BasePGModel
value_model (torchrl.models) – Should be an instance of torchrl.models.ValueModel (Default is None)
normalize_advantages (bool) – If True, normalize the advantages per batch.
advantage (torchrl.utils.estimators.advantage) – Class used for calculating the advantages.
vtarget (torchrl.utils.estimators.value) – Class used for calculating the states target values.

step()[source]¶: This method is called at each interaction of the training loop, and defines the training procedure.