🍒 Emote

Emote — E\ mbark's Mo\ dular T\ raining E\ ngine — is a flexible framework for reinforcement learning written at Embark.

Installation

For package managemend and environment handling we use pants. Install it from pants. After pants is set up, verify that it is setup by running

pants tailor ::

We wanted a reinforcement learning framework that was modular both in the sense that we could easily swap the algorithm we used and how data was collected but also in the sense that the different parts of various algorithms could be reused to build other algorithms.

📄 Coding standard

In emote we strive to maintain a consistent style, both visually and implementation-wise. In order to achieve this we rely on tools to check and validate our code as we work, and we require that all those tools are used for CI to pass.

To have a smooth developer experience, we suggest you integrate these with your editor. We'll provide some example configurations below; and we welcome contributions to these pages. However, we strive to avoid committing editor configurations to the repository, as that'll more easily lead to mismatch between different editors - the description below is authoritative, not any specific editor configuration.

We also require that all commits are made using LF-only line endings. Windows users will need to configure using the below command, or set up their editor appropriately. This helps keep emote platform-generic, and reduces risk for spurious diffs or tools misbehaving.

git config --global core.autocrlf true

Tools

To run the tools mentioned below on the whole repo, the easiest way is with

pants lint ::

black

Black is an auto-formatter for Python, which mostly matches the PEP8 rules. We use black because it doesn't support a lot of configuration, and will format for you - instead of just complaining. We do allow overrides to these styles, nor do we allow disabling of formatting anywhere.

isort

isort is another formatting tool, but deals only with sorting imports. Isort is configured to be consistent with Black from within pyproject.toml.

Example configurations

emacs

(use-package python-black
    :demand t
    :after python
    :hook (python-mode . python-black-on-save-mode-enable-dwim))

(use-package python-isort
    :demand t
    :after python
    :hook (python-mode . python-isort-on-save-mode))

📚 Documentation

To write documentation for emote we use mdBook written in Markdown (.md) files. These can reference each other, and will be built into a book like HTML bundle.

See the mdBook markdown docs for details about syntax and feature support.

Helpful commands

To build the docs: pants package //docs:book
To view the docs in your browser: pants run //docs:serve and then visit http://localhost:8000

🌡 Metrics

Emote can log metrics from two locations: inside the training loop, and outside the training loop. The base for this is the LoggingMixin class in both cases, adds logging functionality to anything. However, it doesn't do any actual logging.

On the training side, the second part of the puzzle is a LogWriter, for example TensorboardLogger. We also provide a built-in TerminalLogger. These accept a list of objects derived from LoggingMixin, and will execute the actual writing of values from the previously of values. This makes implementing log-data-providers easier, as they do not have to care about when to write, only how often they can record data.

logger = SystemLogger()
tensorboard_log_writer = TensorboardLogger([logger], SummaryWriter("/tmp/output_dir"), 2000)
trainer = Trainer([logger, tensorboard_log_writer])

Things behave slightly differently on the data-generation side. Our suggested (and only supported method) is to wrap the memory with a LoggingProxyWrapper. Since all data going into the training loop passes through the memory, and all data has associated metadata, this will capture most metrics.

Our suggestion is that users primarily rely on this mechanism for logging data associated with the agents, as it will get smoothed across all agents to reduce noise.

env = DictGymWrapper(AsyncVectorEnv(10 * [HitTheMiddle]))
table = DictObsMemoryTable(spaces=env.dict_space, maxlen=1000, device="cpu")
table_proxy = MemoryTableProxy(table, 0, True)
table_proxy = LoggingProxyWrapper(table, SummaryWriter("/tmp/output_dir"), 2000)

🔥 Getting Started

In the /experiments folder, example runs can be found for different Gymnasium environments.

For example, you can run the cartpole example using DQN with the following command:

pants run //experiments/gym/train_dqn_cartpole.py@resolve=base

Alt Text

This comes with a lot of predefined arguments, such as the learning rate, the amount of hidden layers, the batch size, etc. You can find all the arguments in the experiments/gym/train_dqn_cartpole.py file.

📊 Tensorboard

To visualize the training process, you can use Tensorboard. To do so, run the following command:

pants run //:tensorboard -- --logdir ./mllogs

This will start a Tensorboard server on localhost:6006. You can now open your browser and go to http://localhost:6006 to see the training process where you can see the rewards over time, the loss over time, etc.

Alt Text

Callback system

In this module you'll find the callback framework used by Emote. Those who have used FastAI before will recognize it, as it's heavily inspired by that system - but adapted for RL and our use-cases.

The `Callback` interface

The callback is the core interface used to hook into the Emote framework. You can think of these as events - when the training loop starts, we'll invoke begin_training on all callback objects. Then we'll start a new cycle, and call :meth:Callback.begin_cycle for those that need it.

All in all, the flow of callbacks is like this:

Dot Graph of Callback flow

package `emote`

Emote

In order to do reinforcement learning we need to have two things: A learning protocol that specifies which losses to use, which network architectures, which optimizers, and so forth. We also need some kind of data collector that interacts with the world and stores the experiences from that in a way which makes them accessible to the learning protocol.

In Emote, data collection is done by Collectors, the protocol for the learning algorithm is built up of Callbacks, and they are tied together by a Trainer.

Classes

`class Callback:`

The principal modular building block of emote. Callbacks are modular pieces of code that together build up the training loop. They contain hooks that are executed at different points during training. These can consume values from other callbacks, and generate their own for others to consume. This allows a very loosely coupled flow of data between different parts of the code. The most important examples of callbacks in emote are the Losses.

The concept has been borrowed from Keras and FastAI.