Publications

The following list of publications is not exhaustive.

Optimally Solving Two-Agent Decentralized POMDPs Under One-Sided Information Sharing

Y. Xie, J. Dibangoye, O. Buffet - 2020

Optimally solving decentralized partially observable Markov decision processes under either full or no information sharing received significant attention in recent years. However, little is known about how partial information sharing affects existing theory and algorithms. This paper addresses this question for a team of two agents, with one-sided information sharing—\ie both agents have imperfect information about the state of the world, but only one has access to what the other sees and does. From the perspective of a central planner, we show that the original problem can be reformulated into an equivalent information-state Markov decision process and solved as such. Besides, we prove that the optimal value function exhibits a specific form of uniform continuity. We also present a heuristic search algorithm utilizing this property and providing the first results for this family of problems.

On continuous-state MDPs in extensive-form

J. Arjonilla, D. Albert, J. Dibangoye - Delayed

This paper presents a novel—yet more scalable—alternative, namely serial central planning for simultaneous decen- tralised execution. This methodology pushes the applicability of Bellman’s principle of optimality further and further, raising three new properties. First, it allows a central planner to reason upon sufficient serial statistics instead of prior simultaneous ones. Next, it proves that -optimal value functions are piecewise linear and convex in suf- ficient serial statistics. Finally, it drops the time complexity of the backup operators from double exponential up to linear.

HSVI pour zs-POSG usant de propriétés de convexité, concavité, et Lipschitz-continuité.

A. Delage, O. Buffet and J. S. Dibangoye - 2020

Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i.e., exploiting the fact that sub-problems are nested recursively within the original problem. Here we show how it can apply to (infinite horizon) 2-player zero-sum partially observable stochastic games (zs-POSGs) by (i) taking a central planner's viewpoint, which can only reason on a sufficient statistic called occupancy state, and (ii) turning such problems into zero-sum occupancy Markov games (zs-OMGs). Then, exploiting the Lipschitz-continuity of the value function in occupancy space, one can derive a version of the HSVI algorithm (Heuristic Search Value Iteration) that provably finds an-Nash equilibrium in finite time.