Differences between Temporal Difference, Monte Carlo and Dynamic Programming based approaches to Reinforcement Learning (RL) |

Promoted

Differences between Temporal Difference, Monte Carlo and Dynamic Programming based approaches to Reinforcement Learning (RL)

July 4, 2021

This post address the differences between Temporal Difference, Monte Carlo, and Dynamic Programming-based approaches to Reinforcement Learning and the challenges to its application in the real world. In particular, the engineering problems faced when applying RL to environments with large or infinite state spaces.


Machine learning and Reinforcement Learning	0	heading
Machine Learning is a set of rules that a computer develops on its own to correctly solve problems. Reinforcement Learning is a type of Machine Learning and a branch of Artificial Intelligence. It allows machines and software agents to automatically determine the ideal behavior within a specific context, to maximize its performance without any supervision. Simple reward feedback is required for the agent to learn its behavior.	1	paragraph
A mobile robot decides whether it should enter a new room in search of more trash to collect or start trying to find its way back to its battery recharging station. It makes its decision based on how quickly and easily it has been able to find the recharger in the past. This example involves interaction between an active decision-making agent and its environment, within which the agent seeks to achieve a goal despite uncertainty about its environment. The agent's actions affect the future state of the environment and thus may require foresight or planning.	2	paragraph
Components of Reinforcement Learning	3	heading
Reinforcement learning has four main sub-elements: a policy, a reward function, a value function, and, optionally, a model of the environment. A policy defines the learning agent's way of behaving at a given time. A reward function defines the goal achieved at a given state and action. A reward function indicates what is good in an immediate sense, a value function represents a prediction of reward at some future time, based on the current state. A model is simply the set of probability functions that determine how an agent would behave in any given state of the system.	4	paragraph
Different methods to solve Reinforcement Learning	5	heading
There are different methods to solve reinforcement learning problems: dynamic programming, Monte Carlo methods, and temporal-difference learning.	6	paragraph
Promoted Promoted In dynamic programming, we require a complete and accurate model of the environment. It requires all the previous states.	7	paragraph
Monte-Carlo requires only experience such as sample sequences of states, actions, and rewards from online or simulated interaction with an environment. Unlike dynamic programming, it requires no prior knowledge of the environment.	8	paragraph
Temporal-difference methods require no model. The agent learns to predict the expected value of a variable occurring at the end of a sequence of states. The Temporal-difference method allows the learned state-values to guide actions that subsequently change the environment state.	9	paragraph
Reinforcement Learning to environments with large or infinite state spaces	10	heading
In reinforcement learning number of states often grows exponentially with the number of state variables. It requires computation and memory for every state. For some problems, even this much memory and computation is impractical. This is why it is unsuitable for image recognition. The deep neural network allows reinforcement learning to be applied to larger problems like automated character recognition.	11	paragraph

Differences between Temporal Difference, Monte Carlo and Dynamic Programming based approaches to Reinforcement Learning (RL)

Machine learning and Reinforcement Learning

Components of Reinforcement Learning

Different methods to solve Reinforcement Learning

Reinforcement Learning to environments with large or infinite state spaces