Constructing Stabilized Dynamic Treatment Regimes
We propose a new method termed stabilized O-learning for deriving stabilized dynamic treatment regimes, which are sequential decision rules for individual patients that not only adapt over the course of the disease progression but also remain consistent over time in format. The method provides a robust and efficient learning framework for constructing dynamic treatment regimes by directly optimizing a doubly robust estimator of the expected long-term outcome. It can accommodate various types of outcomes, including continuous, categorical and potentially censored survival outcomes. In addition, the method is flexible enough to incorporate clinical preferences into a qualitatively fixed rule, where the parameters indexing the decision rules that are shared across stages can be estimated simultaneously. We conducted extensive simulation studies, which demonstrated superior performance of the proposed method. We analyzed data from the prospective Canary Prostate Cancer Active Surveillance Study (PASS) using the proposed method.
READ FULL TEXT