training_node.ppo#

The PPOTrainingNode implements the classic synchronous PPO algorithm with multiple workers.

It continually receives samples from the clients, trains the model, and broadcasts the new network to all workers. The workers wait for the new model, and then start to sample the next batch of trajectories. The algorithm requires all workers to stay connected and is therefore not resilient to network errors etc.

In our PPO implementation, we use General Advantage Estimation with the design decisions recommended in https://arxiv.org/pdf/2006.05990.pdf.

class soulsai.distributed.server.training_node.ppo.PPOTrainingNode(config: SimpleNamespace)#

PPO training node for distributed, synchronized proximal policy optimization.

checkpoint(path: Path, options: dict = {})#

Create a training checkpoint.

Parameters:

path – Path to the save folder.
options – Additional options dictionary to customize checkpointing.

load_checkpoint(path: Path)#

Load a training checkpoint from the folder.

Parameters:: path – Path to the save folder.

load_config(path: Path)#

Load the training configuration from file.

Parameters:: path – Path to the configuration file.

monitor_timing(prom_timer: Gauge)#

Monitor the execution time of a code block and store it in the Prometheus Gauge.

Note

Only activates if Prometheus is enabled in the training config.

Parameters:: prom_timer – A Prometheus Gauge object that is updated with the execution time

run()#

Run the training node.

Derived classes modify the provided hooks in the loop to implement different learning algorithms. The main loop receives samples sent from worker nodes via Redis, verifies that the samples can be used, appends them to a buffer, checks if the training step condition is met, updates the agent and uploads the new parameters to Redis.

Additionally, the training node runs a heartbeat service to detect node disconnects. This is primarily important for synchronous algorithms that do not support the dynamic addition and removal of worker nodes.

save_config(path: Path)#

Save the training configuration to a file.

Parameters:: path – Path to the configuration file.

shutdown(_: Any)#: Shut down the training node.

training_node.ppo

Contents

training_node.ppo#