Differences between Temporal Difference, Monte Carlo and Dynamic Programming based approaches to Reinforcement Learning (RL)
This post address the differences between Temporal Difference, Monte Carlo, and Dynamic Programming-based approaches to Reinforcement Learning and the challenges to its application in the real world. In particular, the engineering problems faced when applying RL to environments with large or infinite state spaces.
Machine learning and Reinforcement Learning | 0 | heading |
Machine Learning is a set of rules that a computer develops on its own to correctly solve problems. Reinforcement Learning is a type of Machine Learning and a branch of Artificial Intelligence. It allows machines and software agents to automatically determine the ideal behavior within a specific context, to maximize its performance without any supervision. Simple reward feedback is required for the agent to learn its behavior. | 1 | paragraph |
A mobile robot decides whether it should enter a new room in search of more trash to collect or start trying to find its way back to its battery recharging station. It makes its decision based on how quickly and easily it has been able to find the recharger in the past. This example involves interaction between an active decision-making agent and its environment, within which the agent seeks to achieve a goal despite uncertainty about its environment. The agent's actions affect the future state of the environment and thus may require foresight or planning. | 2 | paragraph |
Components of Reinforcement Learning | 3 | heading |
Reinforcement learning has four main sub-elements: a policy, a reward function, a value function, and, optionally, a model of the environment. A policy defines the learning agent's way of behaving at a given time. A reward function defines the goal achieved at a given state and action. A reward function indicates what is good in an immediate sense, a value function represents a prediction of reward at some future time, based on the current state. A model is simply the set of probability functions that determine how an agent would behave in any given state of the system. | 4 | paragraph |
Different methods to solve Reinforcement Learning | 5 | heading |
There are different methods to solve reinforcement learning problems: dynamic programming, Monte Carlo methods, and temporal-difference learning. | 6 | paragraph |
Promoted Promoted In dynamic programming, we require a complete and accurate model of the environment. It requires all the previous states. | 7 | paragraph |
Monte-Carlo requires only experience such as sample sequences of states, actions, and rewards from online or simulated interaction with an environment. Unlike dynamic programming, it requires no prior knowledge of the environment. | 8 | paragraph |
Temporal-difference methods require no model. The agent learns to predict the expected value of a variable occurring at the end of a sequence of states. The Temporal-difference method allows the learned state-values to guide actions that subsequently change the environment state. | 9 | paragraph |
Reinforcement Learning to environments with large or infinite state spaces | 10 | heading |
In reinforcement learning number of states often grows exponentially with the number of state variables. It requires computation and memory for every state. For some problems, even this much memory and computation is impractical. This is why it is unsuitable for image recognition. The deep neural network allows reinforcement learning to be applied to larger problems like automated character recognition. | 11 | paragraph |