training_node.dqn#
The DQNTrainingNode implements an Ape-X like framework for deep Q learning.
It continually receives samples from the clients, trains the model, and broadcasts the new network to all workers. Since the training is asynchronous, workers will send samples from previous network iterations. Therefore, we allow samples from the last 3 network iterations into the replay buffer. Older samples are rejected and discarded.
This approach also allows us to dynamically add and remove worker nodes, making the overall architecture more resilient against connection failures, client errors etc.
- class soulsai.distributed.server.training_node.dqn.DQNTrainingNode(config: SimpleNamespace)#
DQN training node for distributed Q learning.
- checkpoint(path: Path, options: dict = {})#
Create a training checkpoint.
- Parameters:
path – Path to the save folder.
options – Additional options dictionary to customize checkpointing.
- load_checkpoint(path: Path)#
Load a training checkpoint from the folder.
- Parameters:
path – Path to the save folder.
- load_config(path: Path)#
Load the training configuration from file.
- Parameters:
path – Path to the configuration file.
- monitor_timing(prom_timer: Gauge)#
Monitor the execution time of a code block and store it in the Prometheus Gauge.
Note
Only activates if Prometheus is enabled in the training config.
- Parameters:
prom_timer – A Prometheus Gauge object that is updated with the execution time
- run()#
Run the training node.
Derived classes modify the provided hooks in the loop to implement different learning algorithms. The main loop receives samples sent from worker nodes via Redis, verifies that the samples can be used, appends them to a buffer, checks if the training step condition is met, updates the agent and uploads the new parameters to Redis.
Additionally, the training node runs a heartbeat service to detect node disconnects. This is primarily important for synchronous algorithms that do not support the dynamic addition and removal of worker nodes.
- save_config(path: Path)#
Save the training configuration to a file.
- Parameters:
path – Path to the configuration file.
- shutdown(_: Any)#
Shut down the training node.