Rackham Fellowship for enabling autonomous agents to learn continuously

Zeyu Zheng

CSE PhD student Zeyu Zheng has been awarded a 2018-19 Rackham International Student Fellowship to support his research in reinforcement learning with his advisor, Toyota Professor of Artificial Intelligence Satinder Singh Baveja. Zheng is working to give autonomous agents the ability to maintain the skills they’ve already learned and use them to develop new ones.

Most current learning algorithms focus on teaching agents one specific skill. To acquire an additional one, the agent will need to start learning it from scratch – the algorithms lack the ability to build on existing knowledge and skills. In fact, an agent may even forget the skills it already learned if trained on a completely new task, a phenomenon called catastrophic forgetting. A field called “lifelong learning” seeks to address this.

As an example, Zheng described Atari 2600 video games, popular testbeds for autonomous agents. An agent that plays the game Space Invaders well cannot play another similar shooting game, Demon Attack. In order to play the latter well, the agent needs to learn it from scratch. Even more frustrating, the agent could easily forget how to play Space Invaders after it learns how to play Demon Attack.

Zheng says this is still a big challenge for AI researchers. Even if autonomous agents can achieve super-human level performance on a single task, they are still unable to be as versatile as a human.

“That’s not what we want,” he says. “What I’m doing is trying to come up with ideas to let the agent continue learning different skills across its life.”

In particular, Zheng plans to study this field with applications to reinforcement learning (RL). RL, unlike other popular learning approaches, depends on an agent finding its own way in an environment in search of a reward stimulus – set the parameters, give it means to perceive the world around it, and let it make mistakes until it figures out a solution.

“I like to say that reinforcement learning is the full problem of life,” says Baveja. The agents learn from trial and error how to solve a given problem with the resources available to them in their environment. RL is a big step from older techniques in the category called supervised learning, which relies on workers and large datasets to train agents by repetition.

Now in his second year, Zheng spent the first year at Michigan working on a different RL problem called intrinsic reward. His goal was to let the agent learn a reward to train itself, allowing it to perform better when seeking the human-designed reward after its training.

“In that sense, we want the agent to somehow drive the learning process by itself,” he says.

The field of RL drew sudden interest in the AI community when its techniques were used to train the game-playing agent AlphaGo Zero. This spike in attention caught Zheng’s eyes as well; he made the switch to RL from his undergrad research work after an internship at Carnegie Mellon.

“To me, it’s really exciting to see whether we can design a machine that can even learn things better than humans,” says Zheng. “The beauty of AlphaGo Zero, in comparison to AlphaGo, is that in AlphaGo they still used expert knowledge – game replays by top players. But in AlphaGo Zero, there’s nothing. We have the learning mechanism, the algorithms and the rules of Go – we don’t have domain knowledge of the gameplay. And the result is that AlphaGo Zero is much better than AlphaGo.”

Posted February 11, 2019