core.replay_buffer

core.replay_buffer#

The replay buffer module offers performant implementations of replay buffers for DQN and PPO.

class soulsai.core.replay_buffer.AbstractBuffer#

Abstract replay buffer class.

abstract property size: int#

Get the buffer size.

Returns:: The buffer size.

abstract append(sample: dict)#

Append a sample to the buffer.

Parameters:: sample – Sample dictionary.

abstract clear()#: Clear the buffer from all samples.

abstract property filled: bool#

Check if the buffer is filled.

Returns:: True if the buffer is full, else false.

abstract sample_batch(batch_size: int) → TensorDict#

Sample a single batch from the buffer.

Parameters:: batch_size – Number of samples in the batch.
Returns:: The sampled batch.

abstract sample_batches(batch_size: int, nbatches: int) → TensorDict#

Sample multiple batches from the buffer.

If sufficient samples are available, the batches will not have dublicate samples across all batches.

Parameters:

batch_size – Number of samples per batch.
nbatches – Number of batches.

Returns:

The sampled batches.

Raises:

RuntimeError – Asked to sample more samples per batch than currently available.

save(path: Path)#

Save the buffers to the specified path.

Uses the torch save function to save a dictionary of the tensors.

Parameters:: path – The save file path.

load(path: Path)#

Load the buffers from the file.

Parameters:: path – The save file path.

class soulsai.core.replay_buffer.ReplayBuffer(max_size: int, device: device = device(type='cpu'))#

Implementation of a replay buffer that lazily allocates storage.

Buffers for samples are allocated on receiving the first sample. An internal index keeps track of the current size of the buffer and enables to only sample from the parts of the buffers already filled with experience. If reproducible sampling is required, the seeds of random and torch have to be set before using the buffer.

property size: int#

Get the buffer size.

Returns:: The buffer size.

append(sample: TensorDict[Tensor])#

Append a sample to the buffer.

Parameters:: sample – Sample dictionary containing the observation, action, reward etc.

clear()#: Clear the buffer from all samples.

property filled: bool#

Check if the buffer is filled.

Returns:: True if the buffer is full, else false.

sample_batch(batch_size: int) → TensorDict[Tensor]#

Sample a single batch from the buffer.

Parameters:: batch_size – Number of samples in the batch.
Returns:: The sampled batch.
Raises:: RuntimeError – Asked to sample more samples than currently available.

sample_batches(batch_size: int, nbatches: int) → TensorDict[Tensor]#

Sample multiple batches from the buffer.

If sufficient samples are available, the batches will not have dublicate samples across all batches.

Parameters:

batch_size – Number of samples per batch.
nbatches – Number of batches.

Returns:

The sampled batches.

Raises:

RuntimeError – Asked to sample more samples per batch than currently available.

class soulsai.core.replay_buffer.PrioritizedReplayBuffer(max_size: int, beta: float = 0.5, device: device = device(type='cpu'))#

Implementation of a prioritized replay buffer.

We fix alpha to 0.5 and use sqrt instead of pow(alpha). See e.g. Dopamine implementation at google/dopamine

append(sample: TensorDict[Tensor])#

Append a sample to the buffer.

Note

The sample must not contain the ‘__priority__’ key.

Parameters:: sample – Sample dictionary containing the observation, action, reward etc.

sample_batch(batch_size: int) → TensorDict[Tensor]#

Sample a single batch from the buffer.

Parameters:

batch_size – Number of samples in the batch.

Returns:

The sampled batch. Index and weight of the samples can be accessed with the ‘__idx__’: and ‘__weight__’ keys.

Raises:

RuntimeError – Asked to sample more samples than currently available.

sample_batches(batch_size: int, nbatches: int) → TensorDict[Tensor]#

Sample multiple batches from the buffer.

Parameters:

batch_size – Number of samples per batch.
nbatches – Number of batches.

Returns:

The sampled batches including the weights and the indices of the samples.

Raises:

RuntimeError – Asked to sample more samples per batch than currently available.

update_priorities(batch: TensorDict[Tensor])#

Update the priorities of the samples at the specified indices.

Note

No samples must be added or removed from the buffer between the sampling and the priority update.

Parameters:: batch – Batch of samples with updated priorities. The batch must contain the keys ‘__idx__’ and ‘__priority__’.

class soulsai.core.replay_buffer.TrajectoryBuffer(n_trajectories: int, n_samples: int, device: device = device(type='cpu'))#

Experience buffer to hold samples of multiple trajectories while preserving their order.

The buffer has a fixed amount of trajectories with a fixed amount of samples. Each sample has a unique trajectory ID and a step ID. The buffer is full when the samples from all trajectories for all steps have been appended. Samples can be appended out of order since the IDs are used to sort them into the correct order.

Note

This buffer is designed for categorical actions PPO only!

append(sample: dict)#

Append a PPO sample to the buffer.

Also sets the complete flag for the received sample.

Parameters:

sample – PPO sample consisting of the observation, the chosen action, the action
probability –
reward (the) –
flag (the terminated) –
ID. (the trajectory ID and the step) –

clear()#: Clear the buffer complete flags and reset the advantage values.

property buffer_complete: bool#: Flag to check if all required samples have been added to the buffer.

core.replay_buffer

Contents

core.replay_buffer#