Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring
We propose a reinforcement learning method for estimating an optimal dynamic treatment regime for survival outcomes with dependent censoring. The estimator allows the treatment decision times to be dependent on the failure time and conditionally independent of censoring, supports a flexible number of treatment arms and treatment stages, and can maximize either the mean survival time or the survival probability at a certain time point. The estimator is constructed using generalized random survival forests, and its consistency is shown using empirical process theory. Simulations and leukemia data analysis results suggest that the new estimator brings higher expected outcomes than existing methods in various settings. An R package dtrSurv is available on CRAN.
READ FULL TEXT