ONLINE LEARNING OF MARKOV JUMP DYNAMIC SYSTEMS
We study the online learning of Markovian Jump Linear Dynamical Systems. These are linear dynamical systems having several Markov modes with each mode represented by a pair of state dynamics matrices. The system’s mode switches with time according to an underlying Markov Chain. We focus on the problem of LQR control for such Markovian Jump Linear Dynamical Systems where the matrices governing the state transitions and the Markov Chain’s state transition probabilities are unknown but the number of modes of the system is known. In this setting, we propose an online learning algorithm which is expected to suffer a sublinear (on the time horizon length) regret when compared against the optimal control of the system in expectation. Where, in the optimal control setting, the system's parameters and the Markov Chain are known. We perform numerical simulations of this proposed algorithm learning a two-dimensional state space and control space with two modes and a simple Markov Chain between the two modes. Then we compare this algorithm against the known optimal solution of our system and expect to observe a sublinear regret.