core.networks

core.networks#

The networks module is a collection of neural network architectures used in RL.

While the architecture is usually less important for agent performance while staying within reasonable hyperparameter regimes and mostly dense networks, users may want to experiment with different network styles such as noisy nets for exploration.

soulsai.core.networks.layer_init(layer: Linear, std: float = 1.4142135623730951, bias_const: float = 0.0) → Linear#

Initialize a linear layer with orthogonal weights and constant bias.

Note

This is an in-place function. The returned reference is just for convenience.

Parameters:

layer – The network layer.
std – The standard deviation of the orthogonal weights.
bias_const – The constant bias value.

Returns:

The initialized layer.

soulsai.core.networks.polyak_update(target_network: Module, network: Module, tau: float)#

Perform a soft parameter update (also called polyak update).

Soft update the weights of a target network from a source network by calculating the weighted average theta_target_net = tau * theta_net + (1-tau) * theta_target_net.

Parameters:

target_network – The target network. Parameters get updated in-place.
network – The source network.
tau – Polyak factor controlling the weighted average.

class soulsai.core.networks.DQN(input_dims: int, output_dims: int, layer_dims: int)#

Deep Q network class.

The network has four layers with variable input, layer and output dimensions. Uses ReLU as non-linearities.

forward(x: Tensor) → Tensor#

Compute the forward pass of the network.

Parameters:: x – Network input.
Returns:: The network output.

class soulsai.core.networks.AdvantageDQN(input_dims: int, output_dims: int, layer_dims: int, nlayers: int = 2)#

Advantage deep Q network class.

The network has a configurable number of hidden layers with variable input, layer and output dimensions. Uses ReLU as non-linearities. Calculates a baseline value as well as advantage values, adds the values and substracts the mean of the advantage values. Layers are initialized with orthogonal weights.

forward(x: Tensor) → Tensor#

Compute the forward pass of the network.

Parameters:: x – Network input.
Returns:: The network output.

class soulsai.core.networks.CNNAdvantageDQN(input_shape: tuple[int, ...], output_dims: int)#

CNN Advantage DQN network.

forward(x: Tensor) → Tensor#

Compute the forward pass of the network.

Parameters:: x – Network input.
Returns:: The network output. Note that the output is a distribution tensor of shape [B A N] instead of [B A], where B is the batch dimension, A is the action dimension, and N is the number of bins (here 32).

class soulsai.core.networks.NoisyDQN(input_dims: int, output_dims: int, layer_dims: int)#

Noisy deep Q network class.

The network has two noisy hidden layers with variable input, layer and output dimensions. Uses ReLU as non-linearities.

See https://arxiv.org/abs/1706.10295.

forward(x: Tensor) → Tensor#

Compute the forward pass of the network.

Parameters:: x – Network input.
Returns:: The network output.

reset_noise()#: Reset the noise in all network layers.

class soulsai.core.networks.DistributionalDQN(input_dims: int, output_dims: int, layer_dims: int, n_quantiles: int = 32)#

QR-DQN network.

The network estimates N Q-values, which each have a probability of 1/N.

forward(x: Tensor) → Tensor#

Compute the forward pass of the network.

Parameters:: x – Network input.
Returns:: The network output. Note that the output is a distribution tensor of shape [B A N] instead of [B A], where B is the batch dimension, A is the action dimension, and N is the number of bins.

class soulsai.core.networks.CNNDistributionalDQN(input_shape: tuple[int, ...], output_dims: int, n_quantiles: int = 32)#

CNN QR-DQN network.

The network estimates N Q-values, which each have a probability of 1/N.

forward(x: Tensor) → Tensor#

Compute the forward pass of the network.

Parameters:: x – Network input. Must be 4-dimensional (BxCxHxW).
Returns:: The network output. Note that the output is a distribution tensor of shape [B A N] instead of [B A], where B is the batch dimension, A is the action dimension, and N is the number of bins.

class soulsai.core.networks.ResidualCNNBlock(n_channels: int)#

Residual CNN block from the Impala paper.

forward(x: Tensor) → Tensor#

Compute the forward pass of the residual block.

Parameters:: x – Network input.
Returns:: The network output.

class soulsai.core.networks.ImpalaBlock(channel_in: int, channel_out: int)#

CNN block of the deep Impala architecture.

Link: https://arxiv.org/pdf/1802.01561.pdf

forward(x: Tensor) → Tensor#

Compute the forward pass of the Imapala block.

Parameters:: x – Network input.
Returns:: The network output.

class soulsai.core.networks.ImpalaDistributionalDQN(input_shape: tuple[int, ...], output_dims: int, n_quantiles: int = 32)#

CNN QR-DQN network with residual blocks.

The network estimates N Q-values, which each have a probability of 1/N.

forward(x: Tensor) → Tensor#

Compute the forward pass of the network.

Parameters:: x – Network input.
Returns:: The network output. Note that the output is a distribution tensor of shape [B A N] instead of [B A], where B is the batch dimension, A is the action dimension, and N is the number of bins (here 32).

class soulsai.core.networks.NoisyAdvantageDQN(input_dims: int, output_dims: int, layer_dims: int)#

Noisy advantage deep Q network class.

The network has two noisy hidden layers with variable input, layer and output dimensions. Uses ReLU as non-linearities. Calculates a baseline value as well as advantage values, adds the values and substracts the mean of the advantage values.

forward(x: Tensor) → Tensor#

Compute the forward pass of the network.

Parameters:: x – Network input.
Returns:: The network output.

reset_noise()#: Reset the noise in all network layers.

class soulsai.core.networks.NoisyAdvantageSkipDQN(input_dims: int, output_dims: int, layer_dims: int)#

Noisy advantage deep Q network class with skip connections.

The network has two noisy layers and a skip connection. Uses ReLU as non-linearities. Calculates a baseline value as well as advantage values, adds the values and substracts the mean of the advantage values.

forward(x: Tensor) → Tensor#

Compute the forward pass of the network.

Parameters:: x – Network input.
Returns:: The network output.

reset_noise()#: Reset the noise in all network layers.

class soulsai.core.networks.PPOActor(input_dims: int, output_dims: int, layer_dims: int)#

PPOActor network parameterizing a stochasic policy from observation inputs.

forward(x: Tensor) → Tensor#

Compute the forward pass of the network.

Parameters:: x – Network input.
Returns:: The network output.

class soulsai.core.networks.PPOCritic(input_dims: int, layer_dims: int)#

PPO critic network to generate value estimates from observations.

forward(x: Tensor) → Tensor#

Compute the forward pass of the network.

Parameters:: x – Network input.
Returns:: The network output.

class soulsai.core.networks.NoisyLinear(input_dims: int, output_dims: int, std_init: float = 0.5)#

A noisy linear layer to create noisy Q networks.

For more details, see https://arxiv.org/pdf/1706.10295.pdf.

reset_parameters()#: Reset the mu and sigma parameter weights.

reset_noise()#: Resample the noise in the layer and update the weights.

forward(x: Tensor) → Tensor#

Compute the forward pass of the layer.

Parameters:: x – Layer input.
Returns:: The layer output.

static scale_noise(size: int) → Tensor#: Create a random tensor scaled by the square root of the absolute of its elements.

core.networks

Contents

core.networks#