Optimal Dynamic Regret in LQR Control

06/18/2022
by   Dheeraj Baby, et al.
0

We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of Õ(max{n^1/3𝒯𝒱(M_1:n)^2/3, 1}), where 𝒯𝒱(M_1:n) is the total variation of any oracle sequence of Disturbance Action policies parameterized by M_1,...,M_n – chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of Õ(√(n (𝒯𝒱(M_1:n)+1)) ) for general convex losses and we prove that it is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz (2020), as well as a new proper learning algorithm with an optimal Õ(n^1/3) dynamic regret on a family of “minibatched” quadratic losses, which could be of independent interest.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro