RL (Reinforcement Learning) experiment

The Reinforcement Learning experiment is a special type of experiment, specifically tailored for integrating RL-ready AnyLogic models onto platforms that specialize in training the AI brains.

Currently, the RL experiment allows for exporting AnyLogic models to the Microsoft Project Bonsai platform. More options will be added in the future.

Please note that as of now, the RL experiment is not designed for running ML-driven experiments directly within your AnyLogic installation, being simply a tool for converting AnyLogic models that are prepared to be exported from AnyLogic and then imported to appropriate platforms.

To learn more about how to integrate AnyLogic into the AI agent training process, visit the Artificial Intelligence page on the AnyLogic website.

Demo model: Activity Based Cost Analysis (RL)

Model prerequisites

To prepare a model for reinforcement learning, that is, make it a valid foundation for an RL experiment, you need to make sure it meets certain requirements:

Some platforms — for example, Microsoft Project Bonsai — assume that the model’s logic contains certain points (in time), upon reaching which the AI agent has to make a decision on which some action should be performed. Should you decide to use these platforms for your RL process, you have to keep in mind that a model should contain such decision points. We need to associate these decision points with events that trigger the AI agent to take action on the model. Among examples of events that can be treated as decision points are:

These events do not trigger the AI agent actions directly. For the AI agent to act on them, you have to create a decision point that gets executed as the result of the event’s occurrence. See Creating a decision point below.

Understanding the logic behind the RL experiment

To configure your RL experiment properly, you need to understand 3 primary concepts behind the implementation of the RL experiment in AnyLogic. All of them describe the numeric variables of some kind, which are used in the procedure of the AI agent’s training.

In the internal structure of the RL experiment in AnyLogic, all of these values are represented as Java classes.

The RL experiment allows you to set up and manipulate these values within the model before starting the actual process of RL training.

Note: Due to platform restrictions, only numeric values can be implemented for now.

To create an RL experiment

  1. In the Projects view, right-click (Mac OS: Ctrl + click) the model item and choose New > Experiment from the context menu.
  2. The New Experiment dialog box opens up.
    Select the  Reinforcement Learning option in the Experiment Type list.
  3. Specify the experiment name in the Name edit box.
  4. Choose the top-level agent of the experiment from the Top-level agent drop-down list.
  5. If you want to apply model time settings from another experiment, leave the Copy model time settings from checkbox selected and choose the experiment in the drop-down list to the right.
  6. Upon completing, click Finish.

The resulting experiment will appear in the Projects view. Click it to access its properties.

To export the RL experiment to Microsoft Project Bonsai

  1. Configure your RL experiment using options available in the Observation, Action, and Configuration section.
  2. Click Export to Microsoft Bonsai in the topmost section of the experiment’s properties.
  3. In the resulting dialog, specify the path where you want to save the ZIP file containing the RL experiment, in the Destination ZIP file edit box, or
    Click Browse... and go to the desired folder. After that, specify the name of the resulting ZIP file using the File name edit box, then click Save.

    AnyLogic: The RL experiment export dialog

  4. Click Next.
  5. On the next page of the wizard, click the Bonsai platform link to open the Microsoft Project Bonsai website in the default browser of your system and follow the instructions provided there.
  6. When a ZIP file of your model is requested, click the Locate ZIP file link in the wizard to navigate to the file’s location on your computer.
  7. Click Finish to close the wizard.
  8. Proceed with the RL training on the Project Bonsai platform.

Properties

General

Name — The name of the experiment.
Given AnyLogic generates a Java class for each experiment, please follow Java naming guidelines and start the experiment’s name with an uppercase letter.

Ignore — If selected, the experiment is excluded from the model.

Export to Microsoft Bonsai — Click this link to start preparing the model for export to Microsoft Project Bonsai.
To learn more, see above.

Top-level agent — In this drop-down list, choose the top-level agent type for the experiment. The agent of this type will play a role of a root for the hierarchical tree of agents in your model.

Observation

Data fields passed to the Learning Platform on each step — Declares the variables that define the observation space.
The following types of variables are supported: int, double, double[]. To assign values to these variables, use the Fill 'observation' data fields from 'root' using code edit box below.

Fill 'observation' data fields from 'root' using code — Specifies the code that associates numeric values from the model with the data fields specified above. Allows for retrieving values from the top-level agent of your model (root). You can either point directly to quantitative values in the model or point to functions that return values.

Simulation run stop condition — Specifies the condition that stops the simulation’s execution when it is evaluated to true. This can be used, for example, to handle situations when the model starts in a non-desired state or terminating a run when continuing the simulation will not add any value to the learning process. Upon triggering the stop condition, the simulation ends.
Allows for addressing the top-level agent’s contents by using root.

Action

Action data returned from Learning Platform on each step — Specifies the variables that will be passed to the model after the AI agent performs some action.
The following types of variables are supported: int, double, double[]. To assign values to these variables, use the Apply 'action' data fields from 'root' using code edit box below.

Apply 'action' data fields to the 'root' using code — Specifies the code that assigns the values calculated by the AI agent to the associated model elements. Able to interact with the top-level agent of your model (root). You can either point directly to quantitative values in the model or point to functions that return values.

Configuration

Configuration data returned from Learning Platform for new simulation run — Declares the variables that are passed to the model before the simulation starts.
The following types of variables are supported: int, double, double[]. To assign values to these variables, use the Apply 'configuration' data fields from 'root' using code edit box below.

Apply 'configuration' data fields to the 'root' using code — Specifies the code that assigns the values calculated by the AI agent to the model elements. Able to interact with the top-level agent of your model (root). You can either point directly to quantitative values in the model or point to functions that return values.

Local training

This section of properties contains the download links that point to the software necessary for local RL training of AI agents.

Model time

Stop — Defines, whether the model will Stop at specified time, Stop at specified date, or it will Never stop. In the first two cases, the stop time is specified using the Stop time/Stop date controls.

Start time — The initial time for the simulation time horizon.

Start date — The initial calendar date for the simulation time horizon.

Stop time — The final time for the simulation time horizon (the number of model time units for the model to run before it will be stopped).

Stop date — The initial calendar date for the simulation time horizon.

Randomness

Random number generator — Here you specify, whether you want to initialize a random number generator for this model randomly or with some fixed seed. This makes sense for stochastic models. Stochastic models require a random seed value for the pseudorandom number generator. In this case model runs cannot be reproduced since the model random number generator is initialized with different values for each model run. Specifying the fixed seed value, you initialize the model random number generator with the same value for each model run, thus the model runs are reproducible. Moreover, here you can substitute AnyLogic default RNG with your own RNG.
In most RL training scenarios, it would make more sense to use the Random seed so that the AI agent can gain experience from an environment that exhibits its random nature. Fixed seed might be useful for testing purposes in a simplified scenario but may lack the variability that is needed for an approach that considers the real-world examples.

Random seed (unique simulation runs) — If selected, the seed value of the random number generator is random. In this case, random number generator is initialized with the same value for each model run, and the model runs are unique (non-reproducible).

Fixed seed (reproducible simulation runs) — If selected, the seed value of the random number generator is fixed (specify it in the Seed value field). In this case, random number generator is initialized with the same value for each model run, and the model runs are reproducible.

Custom generator (subclass of Random) — If for any reason you are not satisfied with the quality of the default random number generator Random, you can substitute it with your own one. Just prepare your custom RNG (it should be a subclass of the Java class Random, e.g. MyRandom), choose this particular option and type the expression returning an instance of your RNG in the field on the right, for example: new MyRandom() or new MyRandom( 1234 ).
You can find more information here.

Description
Use the edit box in this section to specify an arbitrary description of the experiment.

Creating a decision point

To declare that a certain event triggers a decision point for the AI agent (and the experiment step should be performed), call the ExperimentReinforcementLearning.takeAction(agent) static function, passing any model agent as an agent argument. This argument will cause the function to access the top-level (Main) agent, forcing all RL-related data processing (for example, retrieving the observation data) to be done in the context of this agent.

For example, when the event is located within a certain agent, the following code specified as Action of this event would refer this:

ExperimentReinforcementLearning.takeAction(this)