ACM A. M. Turing Award
Canada - 2024
READ FULL CITATION AND ESSAYcitation
For developing the conceptual and algorithmic foundations of reinforcement learning
Andrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award. In a series of papers beginning in the 1980s, Barto and Sutton introduced the main ideas, constructed the mathematical foundations, and developed important algorithms for reinforcement learning, one of the most important approaches for creating intelligent systems. Barto is Professor Emeritus of Computer Science at the University of Massachusetts, Amherst; Sutton is Professor of Computing Science at the University of Alberta and Research Scientist at Keen Technologies.
The field of artificial intelligence (AI) is generally concerned with constructing agents-that is, entities that perceive and act. More intelligent agents are those that choose better courses of action. Thus, the notion that some courses of action are better than others is central to AI.
Reward-a term borrowed from psychology and neuroscience-denotes a signal provided to an agent related to the quality of its behavior; and reinforcement learning (RL) is the process of learning to behave more successfully given this signal.
The idea of learning from reward has been familiar to animal trainers for thousands of years. Alan Turing himself, in his 1950 paper "Computing Machinery and Intelligence," proposed an approach to machine learning based on "rewards and punishments" and reported having conducted some initial experiments with this approach. Arthur Samuel's self-learning checker-playing program, demonstrated on television in 1956, was perhaps the first successful example of reinforcement learning-although it lacked any form of justification as to why or whether it would work.
Within the field of AI, little further progress occurred in this vein until the early 1980s, when Barto and his Ph.D. student Sutton, motivated by observations from psychology, began to formulate reinforcement learning as a general problem framework. They drew on the mathematical foundation provided by Markov decision processes (MDPs), wherein an agent makes decisions in a stochastic environment, receiving a reward signal after each transition and aiming to maximize its long-term cumulative reward. Whereas standard MDP theory assumes that everything about the MDP is known to the agent, the RL framework allows for the environment and the rewards to be unknown. The minimal information requirements of RL, combined with the generality of the MDP framework, allows RL algorithms to be applied to a vast range of problems, as explained further below.
Barto and Sutton, jointly and with other authors, developed many of the basic algorithmic approaches for RL, including temporal difference learning, policy-gradient methods, and the use of neural networks as a tool to represent learned functions. They also proposed agent designs that combined learning and planning, demonstrating the value of acquiring knowledge of the environment as a basis for planning. Perhaps equally important was the textbook, Reinforcement Learning: An Introduction (1998), which is the standard reference in the field and has been cited over 75,000 times. It allowed thousands of researchers to understand and contribute to this emerging field. As a result, RL is among the most active research areas in computer science today.
The most prominent example of RL in recent years was the victory by AlphaGo over the best human Go players in 2016 and 2017; but RL has achieved success in many areas including robot motor skill learning, network congestion control, chip design, internet advertising, optimization, global supply chain optimization, improving the behavior and reasoning capabilities of chatbots, and even improving algorithms for one of the oldest problems in computer science, matrix multiplication. Finally, a technology that was partly inspired by neuroscience has returned the favor: recent research, including work by Barto, has shown that specific RL algorithms developed in AI provide the best explanations for a wide range of findings concerning the dopamine system in the brain.
