Introduction

Reinforcement Learning is optional for the Machine Learning Specialization at Georgia Tech.

This blog series is a personal account of my experiences in the Reinforcement Learning class at OMSCS, and I hope it provides valuable insights for you.

Whether you should follow this series is up to you, but I have put a lot of thought into it to showcase the work that goes into this class.

This is simply a documentation of my journey for OMSCS CS 7642 Reinforcement Learning class and is in no way the only way to partake in this class.

Lastly, my goal with this series is to provide information to others so that they may succeed where I have failed.

OMSCS Reinforcement Learning Series:

Georgia Tech Reinforcement Learning: Preparing for Success
Single Agent Reinforcement Learning: Basic Concepts and Terminologies
Turbocharging Advantage Actor-Critic with Proximal Policy Optimization
Advantage Actor-Critic with Proximal Policy Optimization: A Journey Through Code
Multi-Agent Reinforcement Learning Soft Introduction: Cooperation

Prerequisites

I took Reinforcement Learning after taking both Machine Learning for Trading and Machine Learning.

They have helped me tremendously on my journey, so I pass those off as prerequisites; even so, I don't believe them to be hard or soft prerequisites. These are unnecessary for someone to take before the class, but they help.

I also have a guide for the OMSCS CS 7641 Machine Learning course, which can be used to quickly get up to speed on supervised learning concepts through self-study.

Useful Resources

I list all the resources I consumed before taking this class. I don't know how much or how little these resources helped me on my journey for this class, but they were still a part of my journey. So, I share them with you in the order of resources with the most code to the one with the least.

These are not ordered by importance or relevance:

Grokking Deep Reinforcement Learning (Miguel Morales)
Foundations of Deep Reinforcement Learning (Graesser and Keng)
Reinforcement Learning (Sutton and Barto)
David Silver Lectures
AlphaGo - The Movie

My favorite is the Introduction to Deep Reinforcement Learning book, which mixes mathematical formulas, pseudo-code, and Python code.

This means you can drop the actual coding portions of the book during the class and stick to the more theoretical side of the deep reinforcement learning topics. The book also keeps track of concepts found amongst multiple papers.

Tools for the class

I list some tools that may help you on your course journey.

Better Organization

I am a disorganized person, and this hurt me during Project 3. In reality, the quick, easy, and most effective solution is to better organize experiment notebooks and track changes per experimentation.

Find a tool or method of keeping track of your projects. You'll be running a lot of experiments, especially for Project 3.

However, I just discovered this SAAS called Neptune.ai, a free third-party platform for academic use that tracks your experiments. You send them the information, and they generate graphs and keep track of hyperparameters. I have my eye on this and will learn to use it for future classes if the need arises.

Seriously, those hyperparameter tunings and ablation studies for Project 3. There are so many hyperparameter tunings.

Keep an organized approach to Project 3 when doing multiple experiments or suffer as I did. Please don't suffer as I did. You'll learn a lot; keep an organized approach.

Google Colab

I used Google Colab, and I highly recommend it. Sadly, I had to pay for the pro+ features because they allowed me to run experiments for 24 hours. Even with Python vectorized code, single experiments lasted 3-6 hours; sometimes, I ran into the total 24-hour limit for Project 3.

I did not know that the pro+ plan allows you to run at least five notebooks simultaneously; super useful.

One underrated feature of Google Colab is that it keeps track of all code changes, no matter how small. This is especially useful when you are running multiple experiments and want to see what changed between them in your notebook.

GaTech Library

Project 2 and project 3 heavily rely on the workings of other researchers to better understand the algorithms you are using. For that reason GaTech Library gets the MVP for tools as it has access to everything you'll ever need in terms of academic research.

Scispace

Scispace is an AI tool for academic papers. It's similar to Zotero except that it uses AI and has access to different papers in the Scispace database.

The best feature it has is the tool for breaking down mathematical formulas, which is especially significant when you are new to learning the different meanings of math symbols. Trust me when I say you'll learn more about the symbols over time due to exposure.

An excellent companion book called "Mathematical Notation: A Guide for Engineers and Scientists" is helpful for the class as well.

ChatGPT

ChatGPT has made significant strides in its capabilities over the past year. From its initial introduction to the mainstream, where it lacked the ability to recommend books or articles, it has now evolved to excel in this area, surpassing my previous expectations.

Despite its advancements, ChatGPT is not infallible. It may occasionally provide inaccurate recommendations, especially if the paper you are seeking is not widely known or is from a niche field. However, if the paper is seminal in its field, ChatGPT will likely be aware of it. In other cases, it will still offer suggestions based on existing papers, which can be a valuable starting point for further research.

Getting the most out of the class

To get the most out of the reinforcement learning course, I recommend using two different types of algorithms for Project 2 and Project 3. Project 2 works excellently with Deep-Q Networks, while Advantage Actor-Critic w/ PPO works great with Project 3. This way, you get the full range of understanding of two completely different types of algorithms.

However, many people may not have the time to learn two separate algorithms thoroughly. In such cases, it's beneficial to consider learning a single algorithm for both projects, such as an advantage actor-critic with proximal policy optimization. This approach can be more efficient and can help you understand the potential trade-offs in using a single algorithm for different projects.

CS 7642 Class Structure

Georgia Tech's OMSCS CS 7642 Reinforcement Learning course has six homework assignments and three projects. The beautiful thing about your work is that they build on each other. The homework gives you the knowledge to tackle the projects, and each project builds on top of the others to help you with the next project.

The course's learning structure is the cleanest I have seen so far. Someone put in the time to allow for the exploration and exploitation of materials throughout all the assignments.

If Reinforcement Learning is something you are passionate about, this class will allow you to unleash it.

What are the Reinforcement Learning Assignments?

Georgia Tech's OMSCS CS 7642 Reinforcement Learning course has six homework assignments and three projects.

I recommend tackling the homework assignments as quickly as possible; start working on them when they become available.

This is because the homework will help with your project, but the projects require the most time to complete.

Each project will teach you topics that will then be useful to know and understand for the next project.

Homework 1: Planning in MDPs

Homework 1 is called "Planning in MDPs" and requires you to implement either value iteration or policy iteration on the problem they provide.

Out of all the homework, this is the only one that requires you to create the environment from scratch.

Although simple in concept, executing is pretty challenging as everyone is just learning about the concepts in the assigned reading and lectures.

Homework 2: Lambda Return

The "Lambda Return" assignment is where things start to get interesting.

All they ask of you is to implement Chapter 12.1 of the Reinforcement Learning book by Sutton and Barto. This requires a little more work as the calculations require understanding the N-Step algorithm found in Chapter 7.1 of the same book.

Regardless, understanding N-Step is required as the concept of Lambda is an evolution of N-Step.

If you plan on doing REINFORCE or Actor-Critic with Lambda returns, this homework will give you the knowledge to tackle them.

Project 1: Replicating Suttons 1988 Paper

Project 1 has you replicating the paper Learning to Predict by the Methods of Temporal Differences w/ Errata (Sutton 1988). This is a seminal work in the Reinforcement Learning community, as it was the paper that started to formalize the concept that we now know as Reinforcement Learning.

Like Homework 1, this project requires you to build the environment from scratch. On top of that, you have to develop the TD(\(\lambda\)) from scratch as well. Luckily, the project is small in scope, and experiment runs should not take more than 1 minute to run.

Lastly, I like that this project prepares you for something like Proximal Policy Optimization.

Homework 3: SARSA

From here on out, you'll no longer be required to build your environment from scratch, as the "SARSA" assignment goal is to introduce the gym environment softly. On top of that, the homework will tell you what hyperparameters are required and the action selection policy; in that way, you only have to focus on implementing the algorithm.

This homework assignment also introduces the SARSA algorithm, but you'll be fine as long as you read the Reinforcement Learning book by Sutton and Barto.

Homework 4: Q-Learning

Things get a little harder now as the training wheels begin to come off.

For the "Q-Learning" assignment, you'll be working on solving the taxi gym environment.

What makes this a little more complicated is that you'll have to decide on the action selection policy your agent uses and hyperparameter values.

This assignment has you thinking on getting an agent to fully explore the environment and correctly implementing the Q-Learning algorithm.

What I like about this homework assignment is that it helps you prepare for Project 2.

Project 2: Reinforcement Learning in Continuous State Spaces

The training wheels have come off; it's now ride or die.

Project 2 has you working on the Lunar Lander gym library environment. That said, you have free range to use any Reinforcement Learning algorithm to solve the Lunar Lander problem. The challenge is that all your code must be built from scratch, as you can only use libraries within Numpy when implementing your reinforcement learning algorithm.

This is the first step in truly understanding reinforcement learning in continuous spaces, where you'll use Neural Networks. Keep in mind that everything has been in the tabular method, where all calculations were stored in a table.

This project forces you to learn the concept of using a function approximator for continuous space problems, as the most exciting problems in reinforcement learning are continuous space problems. This project aims to have you understand how hyperparameters affect algorithm performance.

Lastly, I have recommendations based on your goal.

If your goal is to absorb as much knowledge as possible, I recommend learning Mnih et al. (2015) Deep Q-Network algorithm.

This algorithm was the first to elevate reinforcement learning back on the playing field. It not only overcame the concept of the deadly triad but also showed that reinforcement learning algorithms can exceed human-level play in video games.

While the 2015 paper presents the Deep Q-Network algorithm with performant results, I suggest also reading Mnih et al. (2013) paper as it has better pseudocode for reading and implementation. The difference between the 2013 and the 2015 papers is that the 2015 paper introduces the concept of using two neural networks.

If you want to learn a single algorithm thoroughly, I recommend the Actor-Critic framework or Advantage Actor-Critic (A2C) algorithm. A2C is a somewhat complicated algorithm to implement. Still, since this project requires that you understand how hyperparameters affect your algorithm, you can spend all the time for this project understanding hyperparameters. Then, you can upgrade your algorithm to Proximal Policy Optimization (PPO) for project 3.

Homework 5: Bar Brawl

The "Bar Brawl" homework is an underrated assignment. The homework has you focus on the "Know What You Know" concept or the KWIK framework by Littman et al. The idea is simple: you can use what you know to implement any algorithm.

This homework is useful in the context that all problems can be boiled down to some space that we then try to solve.

It is a conceptually fascinating assignment, and I recommend reading more about it as it helps understand what you are trying to do conceptually for Project 3.

Homework 6: Rock Paper Scissors

The "Rock, Paper, Scissors" assignment has you compute the Nash equilibrium for the zero-sum game of "Rock, Paper, Scissors."

They say this project was meant to help with Project 3 but is sort of in a limbo state as of now.

Project 3: Multi-Agent Reinforcement Learning

The last project in the course was, by far, the most challenging and time-consuming. I recommend starting this the day it is released.

Project 3 used to be the Google research football (soccer) environment; however, students will now use the undercooked environment to perform experiments.

Regardless, Project 3 has you focusing on Multi-Agent Reinforcement Learning (MARL) algorithms.

My interpretation of this project is to understand how algorithm choice, structure, and exploration methods affect multi-agent learning.

This project allows for the most creativity, and I recommend reading lots of papers or finding a specialized book on MARL topics.

Conclusion

This class allows you the most control over your learning. You are free to learn as much or as little as you want, and that is what appeals most to me. Basically, you are free to choose your own adventure within the confines of reinforcement learning.

The class is lots of fun, and I guarantee you'll have the knowledge or the confidence to start doing experiments outside of the class.

Before taking this class, I had no knowledge of anything outside of the superficial knowledge of reinforcement learning. Now I feel like I can understand academic papers in the field.

I am sending love and warm wishes to those who step inside the class known as Georgia Tech's OMSCS CS 7642 Reinforcement Learning.

Good luck, stay thirsty!