ANR-LOGO (opens new window)

Planning and Learning to Act in Multi-Agent Systems - plasma


The holy grail of Artificial Intelligence (AI)---creating an agent (e.g., software or machine) that comes close to mimicking and (possibly) exceeding human intelligence---remains far off. But past years have seen breakthroughs in agents that can gain abilities from experience with the environment: providing significant advances in the society and the industries including health care, autonomous driving, recommender systems; and ultimately influencing many if not all aspects of everyday life. These advances are partly due to single-agent Deep Learning (DL) along with RL and Monte-Carlo Tree Search (MCTS), i.e., AI research subfields in which the agent can describe its world as a Markov decision process. Some stand-alone planning and RL algorithms are guaranteed to converge to the optimal behavior, as long as the environment, the agent is experiencing, is Markovian and stationary, but scalability remains a significant issue. DL along with RL and MCTS methods have emerged as a powerful combination to break the curse of dimensionality in the face of very large-scale domains at the expend of astronomical data and computational resources, but so far their applicability is mainly restricted to either single-agent domains or sequential games.

Today, real-life applications widely use MASs, that is, groups of autonomous, interacting agents sharing a common environment, which they perceive through sensors and upon which they act with actuators. At home, in cities, and almost everywhere, a growing number of sensing and acting machines surround us, sometimes visibly (e.g., robots, drones, cars, power generators) but often imperceptibly (e.g., smartphones, televisions, vacuum cleaners, washing machines). Before long, through the emergence of a new generation of communication networks, most of these machines will be interacting with one another through the internet of things (IoT). Constantly evolving MASs will thus break new ground in coming years, pervading all areas of the society and the industries, including security, medicine, transport, and manufacturing. Although Markov decision processes provide a solid mathematical framework for single-agent planning and RL, they do not offer the same theoretical grounding in MASs. In contrast to single-agent systems, when multiple agents interact with one another, how the environment evolves depends not only upon the action of one agent but also on the actions taken by the other agents, rendering the Markov property invalid and the environment no longer stationary. Also, a centralized (single-agent) control authority is often inadequate because agents cannot (e.g., due to communication cost, latency or noise) or do not want (e.g., in competitive or strategic settings) to share all their information all the time.

As a consequence, the increasing penetration of MASs in the society will require a paradigm shift---from single-agent to multi-agent planning and reinforcement learning algorithms---leveraging on recent breakthroughs. That leads us to the fundamental challenge this proposal addresses: the design of generic algorithms with provable guarantees that can efficiently compute rational strategies for a group of cooperating or competing agents in spite of stochasticity and sensing uncertainty, yet using the same algorithmic scheme. Such algorithms should adapt to changes in the environment; apply to different tasks, and eventually converge to a rational solution for the task at hand. But it needs not to exhibit the fastest convergence rates since there is no free lunch. Using the same algorithmic scheme for different problems eases knowledge transfer and dissemination in expert as well as practitioner communities.