A key problem in designing just-in-time adaptive interventions (JITAIs) for mobile health is to learn decision rules from data that can map tailoring variables (user mood, time of day, weather conditions) to intervention options (should we send a message to the user’s phone right now?). Contextual bandit algorithms attempt to construct such decision rules with the goal of maximizing some numerical outcome following every decision point (say, the number of steps walked in 30 minutes after sending an activity encouraging message).
The first paper on contextual bandits was written by Michael Woodroofe in 1979 but the term “contextual bandits” was invented only recently in 2008 by Langford and Zhang. Woodroofe’s motivating application was clinical trials whereas modern interest in this problem was driven to a great extent by problems on the internet, such as online ad and online news article placement. We have now come full circle because contextual bandits provide a natural framework for sequential decision making in mobile health. We will survey the contextual bandits literature with a focus on modifications needed to adapt existing approaches to the mobile health setting. We discuss specific challenges in this direction such as: good initialization of the learning algorithm, finding interpretable policies, assessing usefulness of tailoring variables, computational considerations, robustness to failure of assumptions, and dealing with variables that are costly to acquire or missing.