30 min readPassing Machine Learning in OMSCS: Unlock the Secrets

Passing Machine Learning in OMSCS: Unlock the Secrets

Machine learning is required for the Machine Learning Specialization at Georgia Tech. It has a lot of love, hate, and everything in between.

This blog aims to provide my strategy for writing the assignments for this class with the best possible chance of achieving that coveted 100% score.

Remember that this class is going through changes, and my personal advice may not hold for future assignments as assignments are subject to change.

Whether you should follow this guide is up to you, and I cannot guarantee that you'll score above the mean/median average, let alone 100%.

I can tell you that this guide is something I followed closely and has at least given me 100% or near 100% after assignment 1. If points were deducted, they were deducted for not following assignment instructions (add specific graphs, justify algorithm usage, etc).

Consider Taking Machine Learning for Trading Before This Class

Before diving into the individual assignments and guidance, I wanted to give a quick shout-out to the Machine Learning for Trading (ML4T) course.

I once had the opportunity to talk to Doctor Tucker Balch, the creator of the Machine Learning for Trading (ML4T) course, over at OMSCS. Tucker personally recommended that if you wanted to pursue academia (personally or professionally) and that if you did not know anything about machine learning, it was recommended to take ML4T before taking machine learning.

I cannot recommend this more as understanding the writing format in ML4T will help with the Machine Learning assignments. Plus, ML4T goes over a few topics, such as Decision Trees, Domain Knowledge, and Reinforcement Learning. The reinforcement learning portion will help tremendously with the last assignment of Machine Learning.

As for how I could not understand the Kaggle Titanic Challenge after taking this class, ML4T goes over Normalization and Time Series data but does not teach one-hot encoding, ordinal encoding, the curse of dimensionality, etc. This class will force you to learn all things on your own if they don't add these learnings in future class videos.

Shape of Data

Although I'm not recommending this, I wanted to inform you that data has shape. This topic is called Data Topology or Data Topological Analysis.

Although outside of the scope of this class, I did try to teach myself the basics about the shape of data as someone who hires at Georgia Tech VIP mentioned that they were looking for candidates who knew both machine learning and data topology.

I will say that understanding Data Topology has allowed me to look at Machine Learning and the Datasets differently than I did for the Machine Learning for Trading class.

I also see the benefits of knowing this for the Unsupervised Learning assignment.

I believe watching both videos will help change your perspective on Machine Learning and Data:

If you get anything from this section, it's that there is more than meets the eye when it comes to data.

Tools for the Class

This section goes over some tools you may find helpful for this class. I have used everything in this section when taking this class and cannot recommend these tools enough.

Google Colab

Google Colab is a great tool that I found helped me a lot.

First, it keeps a version history of all your changes so you can focus on writing code. It also comes with AI to help generate code for you so you can work faster, such as developing code for Matplotlib.

Although you should be careful, sometimes it writes code that does not do what you need, so double-check what plots it generates for you.

It also allows for easily sharing your code, which you will need to do with this class while keeping the code private from anyone who does not have the link.

I want to state that Google Colab will not write all the code for you, or even write correct code, but it is still great when it autocompletes a code for generating charts, for example, as it gets it right about 80% of the time (with slight edits from your part).

I recommend the pro version of Google Colab as it will allow you to run code for up to 24 hours without your computer being connected to an instance. This helped me a lot when generating lots of experiments.

Obsidian

Obsidian is an excellent tool for writing down notes. Plus, with the birth of ChatGPT, Obsidian allows you to connect the ChatGPT API to synergize ChatGPT with your notes.

SciSpace

SciSpace is a platform that allows AI/ChatGPT capabilities to help read research papers. I like that you can ask questions about any research paper uploaded to its database. It helps with understanding math formulas if that is something you need help grasping at first.

Overleaf

Overleaf is a platform that helps you write Latex. I recommend this software with the IEEE template, which uses double columns per page.

At first, I tried to use the Joyner Format used in ML4T, but I needed more space to fit everything that the assignment FAQ required you to have.

Plus, this is a great way to learn how to write reports in Latex and plot math formulas. It was a lot of work at the beginning when I came in knowing nothing, but learning Latex has paid off.

Georgia Tech offers the premium version for free to all students.

Springer

Springer is a book publishing company, and Georgia Tech offers free downloads from its website.

Springer is an excellent website to get free books on the topics for each assignment. You can also access these books from the online library at Georgia Tech.

Also consider purchasing supplemental reading material from Amazon, or other places where eBooks are freely available 🤫.

ChatGPT

From the lips of the TA staff and current professor, you can use ChatGPT to help you understand the course materials.

It is excellent when you need to get quick math formulas or understand the topic at hand better.

I recommend pairing this with reading books and research papers.

What are the Machine Learning Assignments

The Machine Learning course has four assignments and two exams that make up your grade.

The four assignments cover the following topics: Supervised Learning, Random Optimization, Unsupervised Learning, and Reinforcement Learning.

Each assignment requires you to write a report based on the requirements gathered from 2 sources: the Assignment instruction page and the Ed Discussion FAQ page. Additionally, they have Office Hours with further requirements, which are trickled down to the Ed Discussion FAQ page after 1-2 weeks.

I believe that the goal of all the assignments is to essentially "prove" that you have mastered the fundamentals of each topic. The assignments allow for "exploration" through experiments, but ultimately, in your written report, you must show that you have learned what is taught in the videos.

This means that the teaching staff doesn't care whether your algorithms are producing the best scores possible; instead, they want to see "why" you are getting the results you are getting.

Learning Objectives

Before a video is recorded, a slideshow is created, or an assignment is drafted, developing a learning objective document is common. A learning document highlights what a teacher/professor wants the student to know after completing a task (video, slide, assignment).

When approaching these assignments, I recommend coming in from the professor's viewpoint. Try to understand what precisely the assignments want you to learn after completing them. This is important because your report should show you have set out to discover what the assignment is trying to teach you.

For some classes, such as AI, the learning objectives are accessible because they result from your submission to Gradescope with immediate feedback. You either learned how to implement the algorithms from the books correctly or not, which requires you to understand the algorithms intuitively to write the code with the expected results.

This is not the case with Machine Learning, in which all assignments are graded based on what you add to your report.

For this class, I recommend watching the lecture videos, reading the required/optional books and papers, and purchasing/reading additional books on class topics (Supervised, Random Heuristics, Unsupervised, Reinforcement Learning). From this, you will see that there are common themes that many authors want you to walk away knowing, and from those themes, you can write a deep and thorough analysis of all the assignments.

Grading Criteria's

If I had to take a gander, your report grade is based on four or five main criteria based on the assignment descriptions.

  • Graphs & Tables
  • Algorithms
  • Comparisons
  • Themes
  • Extra Things

Graphs & Tables

The FAQ and the assignment descriptions will outline what graphs/tables should be included in your report. If you don't have any of them, you will most likely get a grade of 0%, even if your report was beautifully drafted and revised.

The graphs/table requirements may change from semester to semester, so I will leave these out of this guide.

But to warn you, I did not include most/all the required graphs/tables on assignment one and received an F, with the feedback being to check table/graph requirements. This is similar to ML4T grading criteria, where heavy penalization was also given on missing graphs.

Algorithms

Each assignment will ask you to dive into algorithms related to the assignments, for example, in supervised learning, which would be decision trees, neural networks, etc.

The assignment description and the Ed Discussion FAQ post will include the algorithms you must experiment on.

As with the graphs/tables, you must include them to maintain a big chunk of your grade.

I'll briefly go over these requirements, but remember that these may change from semester to semester.

Comparisons

One common thing throughout the assignments is that they would like you to compare all the algorithms against each other. Prepare to leave room in your report to do just that.

All algorithms in all assignments should fall into at least two types of algorithm classes, depending on the topic.

For example, in supervised learning, the algorithms can fall into the parametric and non-parametric algorithm classes.

Consider doing two types of comparisons: high and low-level comparisons.

An example of a high-level comparison would be comparing parametric against non-parametric algorithms based on results from your experiments.

An example of low-level comparison would be comparing the individual algorithms against each other under the non-parametric umbrella: KNNs, SVMs, etc. Again, comparisons should be based on your experiments.

By doing your comparisons this way, you tackle their assignment requirements of comparing all algorithms against each other, maybe.

The comparison topic should be in the Ed Discussion FAQ section and Office Hour videos.

Themes

Each assignment has specific themes that it would like you to discuss. Not all of these themes will be shown in the Assignment page or the Ed Discussion FAQ posts. These themes can be found through lecture videos or required/optional readings. They may be discussed during office hours, but it is not guaranteed.

This is where things start to get exciting and is more on the "exploratory" part of the assignment writing. Understanding what the themes for the assignments are may help in determining the experiments you run.

Remember that the experiments you run should help formulate a discussion of the topics found through lecture videos.

I'll go over these in slightly more detail later. If anything, you can ignore everything else but the themes, as the themes should always stay the same, even when algorithms, comparisons, and chart requirements do.

Extra Things

Some assignments may have a disclaimer hiding: "TAs may add additional points if you go beyond the assignment's requirements."

If you see that, a complete 100% may not be possible unless you add "extra things" to your report. I could be wrong; it may just be extra credit points, but I took it as an additional requirement to add to my reports.

They'll never say what this "extra thing" is, but for me, I just added additional Algorithm(s) or additional Hyper-Parameter(s) analysis. I only noticed this with assignment four, Reinforcement Learning, but your semester may differ.

Datasets

This class will have you choosing two datasets to run your algorithms with, and they will be used throughout assignments 1-3.

Pick datasets with different feature types. Feature types include discrete, continuous, boolean, and categorical values. As long as the dataset's feature types are vastly different from one another, this will allow you to have an extensive range of possible analyses for your reports. An example would be to pick a dataset with all of these feature types and a second dataset with only one feature type for all of the features.

Pick simple datasets that are small/smallish, as this will allow you to run multiple experiments quickly. Even with small datasets, I ran experiments for many hours over many days to generate those "interesting" results and discussions that the assignments wanted from you.

Also consider different output types, such as regression and classification outputs.

Report Format

Since this class is more on the written and research side than coding, I recommend approaching each assignment as an experiment that you then elaborate on in your papers.

For the format, they'll go over this in an office hour video to clarify what it will typically look like.

  • Introduction
    • Algorithms
      • Justify why you "picked" these required algorithms
    • Problems
      • If the assignment asks you to pick problems for the algorithms to explore, this would be the place to explain said problems and justify your choices
    • Experiments
      • Justify why you picked these experiments and tie it with the required themes/comparisons
      • Explain how to reproduce the experiments. Your seed values, libraries, etc
  • Experiment (1/2/3/etc)
    • Introduction
      • Algorithms and experiments to run and justify why you "picked" these algorithms and experiments
      • Explain additional requirements. If you used a hyper-parameter that is different than default, explain that here
    • Hypothesis
      • What do you expect the experiment results to be based on literature, experiences, intuition
      • This is where you hypothesize on the comparisons and themes required for the assignment. Two or three should be fine, be quick don't elaborate. Hypothesis based on intuition is fine.
    • Results
      • Charts/Tables that are required
    • Discussion
      • Explain "why" you got the results that you got. It does not matter if it's right or wrong as long as the results are not due to a mistake in your code. Justify Results.
      • This is where you compare your algorithms results with one another. Try to go over high/low level comparisons. Justify Results.
      • Try to interweave the themes into the discussion. Justify Results.
  • Conclusion
    • Include this section only if you have room. If you only have room for a few sentences, then just state that you went over specific themes and comparisons in your experiment.
      • If you are able to, try to summarize the comparisons of all the algorithms here. Justify Results.
      • If you are able to, summarize what you think are the main themes of the assignment. Justify Results.

You'll notice that I use the word "Justify Results" a lot, and it's because the assignments require that you justify all actions you take and all results you see.

Expect TAs to deduct points if you don't justify why you "picked" the required algorithms, experiments, or results that you observed.

Assignment 1: Supervised Learning

When discussing your experiments, you should reason from the performance metrics. If you use Accuracy for Classification problems, explain why this metric was chosen in the Introduction section. If you use Root Mean Square Error for Regression problems, explain why this metric was selected in the Introduction section.

If you decide to use a different performance metric for comparison amongst algorithms, explain why they were chosen (Justify).

You are not tied to one performance metric and I recommend looking at multiple metrics for better analysis coverage in your reports.

Algorithms

This assignment requires that you show proof of understanding of these five algorithms: Decision Tree, Neural Networks, Support Vector Machine (SVM), and K-Nearest Neighbor (KNN).

Comparisons

For my cohort we had to compare the algorithms on there speed/time to learn and run. Compare the validation and learning curves of all the algorithms.

There may be other or different comparisons you'll have to do based on your cohort/semester. This is where you should double-check what is required of you.

Parametric vs Non-Parametric Algorithms

The algorithms you work on can be categorized into two categories: Parametric and Non-Parametric.

Basically, some algorithms form weak/strong assumptions or relationships between features of the data points. I recommend learning those assumptions and briefly touching on them in your paper.

Quick example: Linear regression assumes that the data is linear. Linear regression works great if the data is linear, not so much for anything else.

Themes

Keep in mind that this assignment allows for 12 pages, most likely due to the amount of graphs they want you to add to the report and the many themes that can come up naturally from supervised learning.

Curse of Dimensionality

The lecture videos, office hours, and books talk about the Curse of Dimensionality, which intuitively states that the more features your dataset has, the more data points/observations are needed.

You should compare how the size of datasets affects your algorithms and compare them with one another.

Linear Separability

For classification/regression problems, each algorithm has a particular hyper-parameter setting that will allow you to gain near perfect or perfect performance scores if and only if a single hyperplane can separate the data.

Run experiments that show this and explain why you got your results.

Noise in Datasets

Some algorithms can perform better than others if noise/errors in the datasets are low. For example, decision trees can handle noise better than K-Nearest Neighbor.

The gist is to explain how noise could be introduced into your datasets and, if so, how they may or may not affect experiment results.

Generalization

Supervised learning aims to create a model that can be used in production (hopefully). Going over your trials and tribulations and how you got to the point of a generalized model should be discussed.

Even if you don't get a model to generalize correctly, explaining why that may be the case should show proof that you understand what is happening with the algorithm.

Now, if your dataset does not generalize on all of the algorithms, immediately abandon your dataset and quickly find another, as you won't have anything "interesting" to report.

Overfitting vs Underfitting

A common theme in the class is the idea of underfitting and overfitting. It's the bane of supervised learning.

Assuming binary classification, based on the algorithm used and the hyper-parameters chosen, it is possible to have a model that does worse than just randomly guessing.

I recommend discussing this concept amongst all algorithms because, depending on the datasets and hyper-parameters chosen, you should have much to talk about the data in terms of the data points in the feature space.

Learning/Training Curve

When using Kfold, you have two options. The size of the training and test sets and with the training set, you can determine what K in Kfold is (how many splits of the training size you want).

The training and testing set's size impacts how the specific model generalizes data. It is typical to have an error/loss metric on the y-axis and the size of training data on the x-axis.

Validation Curve

A validation curve is typically used to evaluate a model's generalization based on changes in the hyper-parameters. It's typical to have your performance metric on the y-axis while your changes in hyper-parameter values are on the x-axis.

It's up to you which tunings you must do unless stated otherwise on the assignment page.

I recommend delving into this in your report, even if not stated on the assignment page.

Bias & Variance Tradeoff

Another theme in supervised learning is the bias and variance tradeoff, which is discussed in lectures, books, and literature on supervised learning. This should be addressed in your paper.

Domain Knowledge

You should get simple/basic domain knowledge for one dataset to allow for more written analysis in your report. Once you do, it can help you better interpret your results.

Kfold Cross-Validation

For our cohort, Kfolding was optional. The assignment page even stated that kfold was optional and asked you to explain your reasoning for using/not using kfold.

The topic of hold-out vs cross validation is interesting.

Books, lectures, and even the ML4T class mentions that there is never a situation in which you don't use KFold to explain your results due to noise in the data. Even Time Series data has a specific technique to get around the KFold issue.

The only situation I can imagine not using Kfold is if your dataset is too large, that to run experiments in a reasonable time, you had to forgo using Kfold. The statistical theory of the Central Limit Theorem/Law of Large Numbers helps explain why Kfold works well, where Kfold finds the mean performance, but KFold is expensive to run.

It would be best to forgo Kfold only if you have an excellent reason not to.

Other Themes

Again, look to the assignment page and Ed Discussion FAQ to determine what else may be needed in your report. If you have space, come up with additional observations.

Assignment 2: Random Optimization

I recommend using the alternative term "Random Heuristics" instead of random optimization for searching books and articles. This is because you will be using a function/algorithm (heuristic) that uses some randomization technique somewhere in the algorithm.

For this assignment, you have to pick three problems (Eg, N-Queens, Travelling Salesman, etc) and justify why you chose them and how they highlight the benefits of one specific algorithm outlined on the assignment page.

For example, for my cohort, we discussed how one problem (e.g., N-Queens) benefits from using one algorithm (e.g., Random Hill Climbing). You have to justify the choice of both N-Queens and Random Hill Climbing.

For this assignment, I ran all algorithms against all problems to understand better how each algorithm performed. I then reported those results in my analysis. We were also told to use the mlrose python package for this assignment.

Algorithms

For my cohort, we reviewed four random optimization algorithms: Random Hill Climbing, Simulated Annealing, Genetic Algorithms, and MIMIC.

MIMIC is an algorithm researched/created by the previous professor of this class.

Again, with all three problems, you'll have to compare/contrast each algorithm, where the caveat is that one algorithm should be highlighted based on the assignment requirement.

Comparisons

The main comparisons between all algorithms were each algorithm's speed, function evaluation, fitness scores, and iterations. This can be done in a table rather than a graph with no penalty to score.

There may be different/additional comparisons, so double-check your assignment page and FAQ.

Local Search vs Evolutionary Algorithms

As you may have noticed, you are asked to experiment with two local search and two evolutionary algorithms.

Although mentioned briefly in the required readings, it would benefit you to discuss these types of algorithms against one another.

Themes

Curse of Dimensionality

So again, we see the curse of dimensionality pop its head. Experiments should contrast how individual algorithms' speed, iterations, function evaluations, and fitness score change as your chosen problem size increases.

Another reason we contrast this is to understand the pros/cons of each algorithm and why we may favor some algorithms over others depending on the problems you have chosen.

No Free Lunch Theorem

The No Free Lunch Theorem intuitively states that each algorithm does no better than random guessing as runtime approaches infinity.

So, if each algorithm does equally wrong with one another, why do we run them? We don't have infinite time, and research in the field shows that some algorithms work better than others, depending on the problem.

Although discussed briefly in the lecture videos, you should discuss how each algorithm approaches the optimal answer as you increase iterations.

It would be best if you also discussed the drawbacks of each algorithm as it approaches the optimal answer.

Monte Carlo Method

Although not discussed in the required readings and lectures, the idea of the Monte Carlo Method is discussed in other literature on the topic.

Any algorithm that relies on repeated random sampling is considered a Monte Carlo Method. Since all algorithms employ some type of randomness, learn where in the algorithms those random processes occur and explain how they may affect results shown from your experiments.

Local Minima/Maxima

A simple heuristic, such as hill climbing, has an issue of getting stuck in a local maxima/minima. Introducing randomness helps your algorithm get "unstuck."

A theme of this assignment is how all algorithms deal with getting "unstuck" from these local minima/maximal. Also, different problems have different scenarios of local minimas/maximas. Some problems even have multiple global minima/maxima.

It would be best to write about these in your reports and how each algorithm handles getting "unstuck" while trying to find the global minima/maxima.

Random Optimizations vs Brute Force Algorithms

Another consideration is how random optimization algorithms compare to brute force algorithms. To keep things simple, assume your brute force "algorithm" looks at every permutation/combination of the problem.

It would help if you touched on an brute force algorithm's performance against your random optimization algorithms as the problem space increases in size.

You don't have to write an algorithm; explain in simple O-Notation the time complexity of said brute force algorithm and how it compares to your random optimization algorithm.

There can be cases where brute force will perform faster as the problem space increases, depending on the problem you pick and how your "algorithm" is "coded."

Other Themes

For my cohort, the other additional themes were speed, function evaluation, fitness scores, and iterations of each algorithm.

Double-check what is required of you. If you have space, come up with additional observations.

Assignment 3: Unsupervised Learning

This assignment requires returning to the assignment one dataset and comparing their usage with unsupervised learning algorithms.

This assignment is the starting point where it seems the class is designed to let your hand go and allow you to come up with a "unique" take on the experiments you run. For this reason I cannot elaborate much on this assignment.

Why is this Assignment Important?

In my cohort, some colleagues have mentioned that this section is unimportant.

I learned from a video by Kaisa Taipale that we can build a model directly from domain knowledge without any data. However, if one does not have domain knowledge (or even if one does), we can use Machine Learning (Unsupervised Learning) to discover features. Then, use those features to improve the model. Repeat this cycle until you have something you are satisfied with. We may even find valuable features that we would not have considered otherwise.

This may be one reason for the usefulness of this assignment.

Algorithms

You are told to experiment with two clustering and four dimensionality reduction algorithms for this assignment. I'll discuss what my cohort was assigned to do, but your cohort may be looking at different algorithms.

For my cohort, we were tasked with the following:

  • Clustering Algorithms
    • Expectation Maximization & One Clustering Algorithm of Your Choice
  • Dimensionality Reduction Algorithms
    • PCA, ICA, RCA, & One Manifold Learning Algorithm of Your Choice

Justify why you have "picked" all algorithms and experiments.

Comparisons

This assignment is different from previous experiments as you should be looking to analyze how individual algorithms assign clusterings and see which features contribute to each reduction algorithm.

I cannot elaborate more on this as it would depend on the semester of the class, the datasets you have chosen, and what the TAs tell you to focus on in the assignments and FAQ page.

The gist is to compare both clustering and dimensionality algorithms with each other based on your datasets and justify the results you have observed with your experiments. The TAs will have outlined what the comparisons for you will be. In this case, we were asked to generate charts/graphs that display how the datasets were clustered or what the new feature space looks like when running a dimensionality reduction algorithm.

We then compare/contrast them with each other, justifying how each algorithm came to the conclusion it did.

Just remember that even though you are told what to use, you should justify why you "picked" the algorithms in the report. If the assignment asks you to look at one or several algorithms, you should also justify why you have "picked" those.

Dimensionality Reduction vs Clustering

Since this assignment requires you to do two types of unsupervised learning algorithms, it may be beneficial to compare/contrast how they operate on unlabeled data. Both need to look for patterns in the data; they do it differently.

Themes

Impossibility Theorem

No clustering scheme can achieve all three of:

  • richness
  • scale invariance
  • consistency

You can learn more by reading Jon Kleinberg's Impossibility Theorem for Clustering

Although not the main point of the assignment, you should include this briefly, as this was discussed in the lecture videos.

Noise in Dataset

It would help if you also looked to see how noise affects clustering and dimensionality reduction algorithms. This should be very brief, and this is discussed in some books on this topic and some educational papers.

Linear Separability

You are not creating a hyperplane with unsupervised learning. Instead what you are doing is looking to see if there are clusters that form that can still produce good performance results when passed to a simple supervised learning algorithm.

For example, in a 2D space, there is a scenario where a decision tree (linearly separated hyper-parameter) can perform extremely well on both the raw dataset and dataset where the feature only explains which cluster they belong to once you when you include the output label.

Clustering Example

Basically you want to see if clustering alone can "explain away" the results when passed to a simple supervised learning algorithm.

Discriminatory Learning

You should compare how your algorithms "discriminate" against your datasets. This will require that you look at what features are making up your clustering and dimensionality reduction algorithms.

Unsupervised learning is learning from discrimination. Intuitively, providing an unsupervised learning algorithm with unlabeled data will find some commonalities between the data points and say something like, "Because they share these features, therefore they are similar."

In the human sense, this would be similar to stereotyping. You may find that some stereotypes are true.

For example, all colorful frogs are poisonous. This may need to be corrected. I'm sure some frogs are toxic and not colorful, but then again, we should see that colorful and poisonous frogs get clustered together. In contrast, non-colorful and poisonous frogs should also be clustered if the data is noise-free and linearly separable.

Curse of Dimensionality

Since you will be dealing with assignment one datasets, consider how the size of your datasets contributes to your experiments and results.

Domain Knowledge

Try to get some domain knowledge on your datasets. From there, use your "domain knowledge" to guide the interpretation of results from both clustering and dimensionality reduction algorithms.

Can domain knowledge be beneficial to adding/removing features based on experiment results?

Other Themes

You'll have to read the assignment page and the Ed Discussion FAQ to gather what should be included in your report. If you have space, come up with additional observations.

Assignment 4: Markov Decision Process

This assignment abandons the datasets used for assignment 1-3. Instead, they ask you to come up with your own unique experiments based on the assignment page requirements.

This experiment allows the most flexibility in terms of experimentation as long as you focus on the comparisons amongst algorithms along with the topics discussed in lecture videos and required readings.

For this assignment, you will pick two problems: 1 grid world and one non-grid world experiment. You can also do two non-grid world problems. You will also choose two different problem sizes. One should be small (200 or fewer states is considered small), while the other should be large (some number further away from 200 that you justify).

For clarification, a grid world is one where you can only move left/right in 1D and left/right/up/down in 2D, where moves may or may not be deterministic.

This assignment also asks that you go above and beyond the assignment requirements for extra points at the discretion of the grader. I chose to look at two model-free algorithms.

They recommend using a Python library called bettermdptools. Still, I decided to use the Mushroom RL library, which requires additional work to set up the problem from scratch but allows for more control. I recommend using Mushroom RL only if you have time for additional setup.

To briefly explain the possible learning criteria for this assignment: Dynamic Programming gave birth to iterative algorithms (Value and Policy Iterations), which then gave birth to Model-Free Reinforcement Learning Algorithms (heavily inspired). Everything uses the Bellman's Equation as there foundation.

Deeper Dive

If you want a deeper dive into the concepts of reinforcement learning, consider taking a look at my Single Agent Reinforcement Learning: Basic Concepts and Terminologies article.

Algorithms

This assignment will require that you focus on Value Iteration, Policy Iteration, and one model-free Reinforcement Algorithm of your choice.

For this one, I recommend Q-Learning as it is taught in ML4T, but other students have used SARSA.

For the extra points of going above and beyond, you should experiment with both Q-Learning and SARSA.

Lastly, it's also crucial you justify what the reward scheme, policy, learning rate, and probabilities are, as they will all shape how each algorithm runs and converges.

Comparisons

Compare and contrast Value and Policy Iteration and all the model-free algorithms you have chosen to experiment on.

How does each algorithm converge? How do they determine optimal policy?

Find as many questions to answer, but make sure that you discuss these differences for all the algorithms or risk getting points docked.

Model-Free vs Dynamic Programming

Value iteration and policy iteration belong to the dynamic programming algorithm class. You can use this opportunity to compare/contrast this with model-free algorithms such as Q-Learning/SARSA if those are the algorithms you have chosen.

If the assignment makes you choose a Model-Based algorithm, then do the comparison between that and the dynamic programming methods (keep in mind value iteration and policy iteration are considered model-based due to knowing the "model" ahead of time so find different areas to analyze between the two).

Themes

Convergence Criteria

A typical convergence strategy of algorithms is the cumulative mean reward, or the reward obtained as you consider future moves.

You can also pick different convergence criteria as long as you justify why the convergence criteria were selected.

Exploration vs Exploitation

It would be best to discuss the tradeoff between exploration and exploitation and what you saw in your experiments for the model-free algorithms.

Dynamic programming only has an exploitation component but should still be discussed nonetheless.

Curse of Dimensionality

Again, we see that the size of the state space will affect your algorithms. Explain the results you see from the experiments and contrast them with similar experiments of a smaller size. You can also briefly explain how you can overcome the problems of large problem spaces.

Bellman Equation

All the required algorithms you are asked to experiment on will use a unique variation of the Bellman equation. Tie that into your report analysis.

Hyper-parameter Tuning

Changing hyper-parameter values lead to different agent behaviors.

Consider diving into this in your experiments and discuss how these affect model-free and dynamic programming algorithms (value iteration should have a threshold hyper-parameter).

Other Themes

You should check the assignment page and Ed Discussion FAQs to determine if there are other aspects to consider for your project.

Since Reinforcement Learning is vast and an active area of research, I want to bring up concepts you can use in your reports.

Sample Efficiency

An agent's sample is one iteration or interaction with the environment that leads to learning the optimal policy, while efficiency is the measure of quality. The fewer samples an agent takes to learn a task, the more "sample efficient" the agent is.

In layperson's terms, if you don't know how to eat soup with a spoon, you'll teach yourself how to do so through samples. These samples would include experimenting with everything until you have learned to eat soup (trial and error). If you discovered how to eat soup in three samples (three tries through trial and error), we would say that you are "sample efficient."

Compare that with the Tesla X1000 robot, which takes perhaps five million samples before it can eat soup with a spoon; we then say that the robot is "sample inefficient" compared to you.

Fully vs Partially Observable States

The idea of reinforcement learning is to develop the best possible policy for any state in the environment. Some environments have nothing hidden from the agent, while others have some things hidden from the agent. Think chess vs. poker.

Agent Actions

One concept of Reinforcement Learning is that the actions of an agent are based on the current state and dynamics of the world the agent resides in. These dynamics include but are not limited to obstacles, rewards/penalties, and stochastic/deterministic agent actions.

The learning policy can also affect these actions (Epsilon, Softmax, Boltzmann, etc.)

The agent's world and everything that resides in that world may be deterministic or stochastic and will affect how an agent learns.

The environment affects the agent, while the agent affects how the environment responds, also known as the agent-environment interaction loop.

Global vs Local Optimal Policies

For our cohort, we had the option of picking one grid world problem. For most grid world problems, there is the idea that an agent has both a fixed starting and goal position, meaning that we don't need the agent to find the best policy for all states in the grid world (Global Optimal Policy). Still, we are looking to see the best policy for just the starting state of the agent (Local Optimal Policy).

This is another analysis you could take in your paper. You may want a strategy that looks at the best policy for the goal position, no matter the agent's starting position.

Problem Set

Although optional, students have expressed that doing the problem sets has led to higher individual curves. This means that if two students had the same raw score (e.g., 69), one student who does both problem sets to the fullest may get an A, whereas the other may only receive a B.

This is also dependent on the class average or median, please take the curves with a bucket of salt.

Doing both problem sets to the fullest has benefits in terms of your grade curve and preparation for the exams.

Exams

I would take this section with a grain of salt. I could be a better test taker or test preparer.

All/most exam questions are multiple-choice, where there may only be one correct answer. No testing strategy will help with this. You must build up a high level intuition of everything that encompasses the class, including the lecture videos, required/optional readings, and office hours.

My Take

If you tackle all assignments with earnest and honest intent while learning the materials in the class, and do well/decent on the exams, you should be able to pull off an A.

Remember that the goal is to "prove" you learned the material through your written report. This will also reflect on your problem sets and exams.

Finally, I would say that achieving an A is a lot of hard work and time. Lots of experimentation to understand algorithms, redoing experimentation to understand assignment themes, and lots of stress/lack of sleep.

Even if you have one lousy screwup, and I mean horrible, obtaining a B without a curve is possible.

Lastly, if you plan on taking the Reinforcement Learning course titled CS-7642, consider reading my Reinforcement Learning series as it was aimed to help students and curious learners alike.

I hope this guide helps reduce stress while increasing your grade. I am sending love and warm wishes to all who travel through OMSCS Machine Learning CS-7641.

Hello! I'm just a person who wants to help others on their programming journey.

Godot Tutorials

Student