David Silver, Satinder Singh, Doina Precup, and Richard S. Sutton, Reward is enough, Artificial Intelligence, Volume 299, October 2021, 103535.
In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.
Structure:
In Section 2 we formalise the objective of reward maximisation as the problem of reinforcement learning. In Section 3 we present our main hypothesis. We consider several important abilities associated with intelligence, and discuss how reward maximisation may yield those abilities. In Section 4 we turn to the use of reward maximisation as a solution strategy. We present related work in Section 5 and finally, in Section6we discuss possible weaknesses of the hypothesis and consider several alternatives.
Ben Dickson, DeepMind says reinforcement learning is ‘enough’ to reach general AI, June 9, 2021.
Patricia Churchland, neuroscientist, philosopher, and professor emerita at the University of California, San Diego, described the ideas in the paper as “very carefully and insightfully worked out.”
However, Churchland pointed it out to possible flaws in the paper’s discussion about social decision-making. The DeepMind researchers focus on personal gains in social interactions. Churchland, who has recently written a book on the biological origins of moral intuitions, argues that attachment and bonding is a powerful factor in social decision-making of mammals and birds, which is why animals put themselves in great danger to protect their children.
“I have tended to see bonding, and hence other-care, as an extension of the ambit of what counts as oneself—‘me-and-mine,’” Churchland said. “In that case, a small modification to the [paper’s] hypothesis to allow for reward maximization to me-and-mine would work quite nicely, I think. Of course, we social animals have degrees of attachment—super strong to offspring, very strong to mates and kin, strong to friends and acquaintances etc., and the strength of types of attachments can vary depending on environment, and also on developmental stage.”
This is not a major criticism, Churchland said, and could likely be worked into the hypothesis quite gracefully.
“I am very impressed with the degree of detail in the paper, and how carefully they consider possible weaknesses,” Churchland said. “I may be wrong, but I tend to see this as a milestone.”
Data scientist Herbert Roitblat challenged the paper’s position that simple learning mechanisms and trial-and-error experience are enough to develop the abilities associated with intelligence. Roitblat argued that the theories presented in the paper face several challenges when it comes to implementing them in real life.
“If there are no time constraints, then trial and error learning might be enough, but otherwise we have the problem of an infinite number of monkeys typing for an infinite amount of time,” Roitblat said. The infinite monkey theorem states that a monkey hitting random keys on a typewriter for an infinite amount of time may eventually type any given text.
Roitblat is the author of Algorithms are Not Enough, in which he explains why all current AI algorithms, including reinforcement learning, require careful formulation of the problem and representations created by humans.
“Once the model and its intrinsic representation are set up, optimization or reinforcement could guide its evolution, but that does not mean that reinforcement is enough,” Roitblat said.
In the same vein, Roitblat added that the paper does not make any suggestions on how the reward, actions, and other elements of reinforcement learning are defined.
“Reinforcement learning assumes that the agent has a finite set of potential actions. A reward signal and value function have been specified. In other words, the problem of general intelligence is precisely to contribute those things that reinforcement learning requires as a pre-requisite,” Roitblat said. “So, if machine learning can all be reduced to some form of optimization to maximize some evaluative measure, then it must be true that reinforcement learning is relevant, but it is not very explanatory.”
沒有留言:
張貼留言