User feedback is essential for the control of personalized devices to ensure user satisfaction. In a user feedback control solution, the user provides feedback to the controller analogously to sensors.

We model and learn users' demands with two approaches: Firstly, we learn a priori unknown user-specific state transition probabilities in a Markov model with restless multi-armed bandit experiments, where the state transition probabilities originate from discrete controller decisions and the states correspond to the user’s satisfaction level. Secondly, a user-specific cost or reward function is predicted in an inverse optimal control framework. 

