Universitetet i Agder

Advances in Safe Deep Reinforcement Learning for Real-Time Strategy Games and Industry Applications

The dissertation presents four algorithms for model-based reinforcement learning, where two of the algorithms focus on improving safety during learning in mission-critical systems. The dissertation moves towards solving some of the core challenges of reinforcement learning, namely safety and sample efficiency.The dissertation presents four algorithms for model-based reinforcement learning, where two of the algorithms focus on improving safety during learning in mission-critical systems. The dissertation moves towards solving some of the core challenges of reinforcement learning, namely safety and sample efficiency.

Per-Arne Andersen

PhD Candidate

You may follow the disputation on campus or online. Link for registration as an online spectator at the bottom of this page.

Per-Arne Andersen of the Faculty of Engineering and Sciences at the University of Agder has submitted his thesis entitled «Advances in Safe Deep Reinforcement Learning for Real-Time Strategy Games and Industry Applications» and will defend the thesis for the PhD-degree Friday 29 April 2022.

He has followed the PhD-programme at the Faculty of Engineering and Science at the University of Agder with specialisation in ICT, Scientific field Artificial Intelligence.

Summary of the thesis by Per-Arne Andersen:

Advancements in Safe Deep Reinforcement Learning for Real-Time Strategy Games and Industry Applications

This dissertation advances cutting-edge techniques in artificial intelligence-based decision systems. These techniques are tested in real-time strategy games like StarCraft II and mission-critical industrial systems. The focus area is deep reinforcement learning, a combination of deep learning and reinforcement learning. The overarching goal of this research work is to enable computer systems to reach optimal decision sequences without making mistakes.

Games simulations

Games are frequently used to test the efficiency of reinforcement learning systems. For example, games can provide simulations of real-world industry applications that reduce experiment costs and improve reproducibility.

Reinforcement learning can eliminate manual and sometimes risky labor in industrial settings where expert systems dominate. Traditional reinforcement learning systems learn by trial and error. As a result, reinforcement learning agents risk damaging people or equipment while learning. Therefore, using games to learn reinforcement learning agents to operate safely can eliminate these risks. The key impact of solving these concerns is to enable highly efficient and safe autonomous systems that exist in various forms in society's daily routine.

The complexity of real-time strategy games is compelling for artificial intelligence research. Tasks that require simultaneous operations, imperfect information, and system randomness are elements of real-time strategy games. With recent developments, reinforcement learning algorithms learn to achieve superhuman performance in games like StarCraft II. The disadvantage is that these algorithms are expensive and difficult to train, making them hard to apply in industrial applications.

Research Gap

Reinforcement learning is the process where the machine seeks to maximize a feedback signal through trial and error. Current cutting-edge reinforcement learning algorithms have essential limitations because they require a lot of exploration to learn good decision sequences. This exploratory approach can lead to undesirable outcomes in real-world systems.

In general, reinforcement learning follows a risk-neutral learning strategy, where fatal decisions are central to the learning objective. Such errors cannot be tolerated in mission-critical systems and require safety to prevent human and real-world equipment damage. As a result, there is a need to develop new training algorithms to preserve safety during learning.

Lastly, cutting-edge research uses computationally demanding games, such as StarCraft II. This requires expensive computer systems that are not widely accessible to all research institutions. Other alternatives exist, but they lack the flexibility to adjust task difficulty and computational complexity.

There are substantial challenges to address in this thesis. In summary, reinforcement learning has low sample efficiency, focuses mostly on risk-neutral training, and has limited access to variable contexts for experimentation. This leaves various gaps where there is considerable room for improvement. Moving to address these gaps and toward better decision making in industry-like environments, we divide the research into three separate topics:

Topic 1: Game environments for reinforcement learning research with flexible tasks and computation difficulty

Topic 2: Model-Based Reinforcement Learning for efficient reinforcement learning in real-time strategy games

Topic 3: Safe reinforcement learning for industry-like systems

Topic 1: Game Environments

This research work addresses the environment gap by proposing six new game environments to evaluate reinforcement learning algorithms in several tasks.

Deep Line Wars and Deep RTS are two novel real-time strategy games for testing algorithms in planning under imperfect information.

Deep Maze is a flexible labyrinth game for learning algorithms to navigate mazes from memory.

Deep Warehouse is a specially crafted game for evaluating the safety of reinforcement learning algorithms in Automated Storage and Retrieval Systems (ASRS), which is the exclusive focus of this research work.

An ASRS has autonomous vehicles that seek to maximize item throughput in a three-dimensional grid. All games provide parameters that adjust the problem complexity and a flexible scenario engine that can challenge algorithms in various problems, such as memory and control.

We empirically show that these games are significantly more computationally efficient than games of similar complexity. The diversity of proposed games can help fill the complexity gap in the scientific literature. We finally introduce the Center for Artificial Intelligence and Reinforcement Learning (CaiRL) toolkit for high-performance reinforcement learning research, which collects all proposed environments into a single runtime.

Topic 2: Model-Based Reinforcement Learning

This research work proposes model-based reinforcement learning techniques that focus on decision sample efficiency and safety.

The dissertation presents the Dreaming Variational Autoencoder (DVAE) that learns to mimic how the game engine dynamics work. The learning happens through learning by demonstrations.

After the learning phase finishes, traditional, inefficient reinforcement learning algorithms can safely train using the game approximation at accelerated speeds.

Further, this dissertation presents Observation Reward Action Cost Learning Ensemble (ORACLE) that similarly learns how the game engine functions but can learn more complex game dynamics. Therefore, ORACLE is more

suitable for games with advanced graphics like StarCraft II but needs to balance training time and expressiveness.

Topic 3: Safe Reinforcement Learning

Several methods exist to improve safety in reinforcement learning algorithms for safety-critical systems. This experimental work shows that it is possible to decrease the failure rate during training without putting unrealistic restrictions or assumptions on the learning goals.

Specifically, the work presents a framework for learning a behavior model of a system. This model is then used to perform reinforcement learning exploration in a fully isolated learned environment.

Main outcome

The dissertation contributes four open-source games to enrich the diversity of available games for reinforcement learning research. Consequently, it is now more accessible for educational institutions to adjust problem complexity based on available funding and computational resources.

All of the contributions are collected into the CaiRL research toolkit that focuses on lowering the overhead of experiments, moving towards more efficient games for research.

The dissertation presents four algorithms for model-based reinforcement learning, where two of the algorithms focus on improving safety during learning in mission-critical systems.

The dissertation moves towards solving some of the core challenges of reinforcement learning, namely safety and sample efficiency.

In a nutshell, we believe that the game environments, reinforcement learning methods, and studies presented in this dissertation help advance the state-of-the-art research within the studied topics and contribute positively toward solutions for enabling the general use of reinforcement learning in mission-critical industrial applications.

Disputation facts:

The trial lecture and the public defence will take place on campus in Auditorium C2 040, Campus Grimstad, and online (registration link below) via the Zoom conferencing app - registration link below.

Professor Christian Omlin, Faculty of Engineering and Science, University of Agder, will chair the disputation.

The trial lecture Friday 29 April at 10:15 hours

Public defence Friday 29 April at 12:15 hours

Given topic for trial lecture: «Multi-Agent Learning Meets Industry Applications»

Thesis Title: «Advances in Safe Deep Reinforcement Learning for Real-Time Strategy Games and Industry Applications»

Search for the thesis in AURA - Agder University Research Archive, a digital archive of scientific papers, theses and dissertations from the academic staff and students at the University of Agder.

The thesis is available here:

PhD Thesis Per-Arne-Andersen-Dissertation

The Candidate: Per-Arne Andersen (1992, Karmøy) Bachelor degree UiA (2015) and Masters degree in ICT, UiA (2017) from the University of Agder. Master thesis“Deep Reinforcement Learning using Capsules in Advanced Game Environments”. Present position: Assistant Professor at the Department of ICT, UiA.

Opponents:

First opponent: Professor Ailo Bongo, UiT The Arctic University of Norway

Second opponent: Lecturer Yali Du, PhD, King’s College London, UK

Professor Frank Reichert, University of Agder, is appointed as the administrator for the assessment committee.

Supervisors in the doctoral work were Professor Morten Goodwin, University of Agder (main supervisor) and Professor Ole-Christoffer Granmo, University of Agder (co-supervisor)

What to do as an online audience member:

The disputation is open to the public, but to follow the trial lecture and the public defence digitally, transmitted via the Zoom conferencing app, you have to register as an audience member on this link: (If you attend the disputation in the Auditorium, you do not need to register)

https://uiano.zoom.us/meeting/register/u5Mvde2srzMuH9bdy4J6YHLLVJtk5wEvWmeM

A Zoom-link will be returned to you. (Here are introductions for how to use Zoom: support.zoom.us if you cannot join by clicking on the link.)

We ask online audience members to join the virtual trial lecture at 10:05 at the earliest and the public defense at 12:05 at the earliest. After these times, you can leave and rejoin the meeting at any time. Further, we ask audience members to turn off their microphone and camera and keep them turned off throughout the event. You do this at the bottom left of the image when in Zoom. We recommend you use ‘Speaker view’. You select that at the top right corner of the video window when in Zoom.

Opponent ex auditorio:

The chair invites members of the public to pose questions ex auditorio in the introduction to the public defense, with deadlines. It is a prerequisite that the opponent has read the thesis. Questions can be submitted to the chair Christian Omlin on e-mail christian.omlin@uia.no