Inverse Reinforcement Learning: A Microeconomics-Based Approach
The general theme of this thesis is inverse reinforcement learning (IRL) for cognitive systems. By observing the end decisions generated from a cognitive system in multiple environments, the central idea revolves around how to identify if a system strategy is optimal and if so, estimate the system's utility function. Both traditional IRL in machine learning and revealed preference in microeconomics tackle the same question: given expert behavior, how to perform inverse optimization and reconstruct the expert's utility? This thesis advances and amalgamates the state-of-the-art in both machine learning and microeconomics theory in both a theoretical and applied sense. First, we consider Bayesian stopping time problems and formulate necessary and sufficient conditions for an agent's behavior to be consistent with optimal stopping. The results advance the state-of-the-art in IRL in that an analyst can estimate system utility without requiring the agent's private observation likelihood. For practical applications, we also develop concentration inequalities for identifying strategy optimality when the analyst has empirical datasets. This class of IRL results may be used, for example, to formulate effective teaching strategies that cater to a particular student's attention span without intrusive probing. Second, we exploit revealed preference-based IRL for adversarial identification and mitigation schemes for cognitive radars. The IRL techniques developed herein can be viewed as electronic countermeasures (ECM) for cognitive radars and facilitate non-parametric system identification of adversarial entities. For instance, by observing the emissions of an adversarial target, a cognitive radar can estimate the target's system utility and can tune its sensing strategy to minimize the asymptotic covariance of the estimate of the target's coordinates. Third, we unify two areas in microeconomics, namely, revealed preference and costly information acquisition. Tests for costly information acquisition identify if a decision maker expends attention optimally as a cognitive 'cost', where attention abstracts the decision maker's private subjective signals. Our result shows that the test for costly information acquisition is identical to a Bayesian (Blackwell order) analog of the test for quasi-linear utility maximization under an appropriate parameter map. The unification has several consequences: (i) we exploit the well-known equivalence between GARP and Afriat inequalities to reduce the computational complexity for testing costly information acquisition (combinatorial to quadratic); (ii) we can formulate robustness tests for costly information acquisition (translated from revealed preference) under noisy datasets. Finally, as an illustration, we perform a revealed preference-style analysis of user engagement metadata from a real-world YouTube dataset comprising 190k videos and show YouTube user engagement is consistent with costly information acquisition. Finally, we take a step beyond the general philosophy of IRL and propose inverse-inverse reinforcement learning (I-IRL). The key idea is to presume the presence of an adversary performing IRL. If a cognitive system is aware of such an adversary, how can the system tweak its strategy to mask its utility from adversarial IRL? We specify how a cognitive system can deliberately choose 'optimal' sub-optimal responses that trade-off between maximizing the system utility and minimizing the probability of accurate utility reconstruction. From a cognitive radar's perspective, I-IRL can be viewed as an ECCM mechanism that minimizes strategy leakage subject to a bound on the radar's deviation from optimal sensing strategy. From a privacy-preserving perspective, I-IRL specifies the minimum magnitude of 'noise' required to be added to an offline dataset to minimize the recoverability of private attributes from the dataset.