Responsible Emergent Multi-Agent Behavior
Responsible AI has risen to the forefront of the AI research community. As neural network-based learning algorithms continue to permeate real-world applications, the field of Responsible AI has played a large role in ensuring that such systems maintain a high-level of human-compatibility. Despite this progress, the state of the art in Responsible AI has ignored one crucial point: human problems are multi-agent problems. Predominant approaches largely consider the performance of a single AI system in isolation, but human problems are, by their very nature, multi-agent. From driving in traffic to negotiating economic policy, human problem-solving involves interaction and the interplay of the actions and motives of multiple individuals. This dissertation develops the study of responsible emergent multi-agent behavior, illustrating how researchers and practitioners can better understand and shape multi-agent learning with respect to three pillars of Responsible AI: interpretability, fairness, and robustness. First, I investigate multi-agent interpretability, presenting novel techniques for understanding emergent multi-agent behavior at multiple levels of granularity. With respect to low-level interpretability, I examine the extent to which implicit communication emerges as an aid to coordination in multi-agent populations. I introduce a novel curriculum-driven method for learning high-performing policies in difficult, sparse reward environments and show through a measure of position-based social influence that multi-agent teams that learn sophisticated coordination strategies exchange significantly more information through implicit signals than lesser-coordinated agents. Then, at a high-level, I study concept-based interpretability in the context of multi-agent learning. I propose a novel method for learning intrinsically interpretable, concept-based policies and show that it enables novel behavioral analysis tools via concept intervention that can reliably detect emergent coordination, coordination failures (lazy agents), the emergence of strategy and role assignment, and other dependencies between agent behavior. In the second part of the thesis, I study fairness through the lens of cooperative multi-agent learning. There I show that, despite being necessary for learning sophisticated coordination, mutual reward alone does not incentivize fair multi-agent behavior. I introduce novel group-based measures of fairness for multi-agent learning and develop two novel algorithms that achieve provably fair outcomes via equivariant policy learning. The third part of this thesis addresses robustness. I present a systematic analysis of search-based multi-agent learning systems such as AlphaZero and identify concrete failure modes that are present in its policy and value networks, but are disguised by search. I use these empirical findings to derive a novel extension of AlphaZero that combines uncertainty-informed value estimation and improved exploration to align AlphaZero’s policy and value predictions; thereby improving its robustness. Altogether, this body of work develops a framework within which researchers and practitioners can begin to understand and shape multi-agent learning systems; representing an initial step towards connecting Responsible AI and multi-agent learning.