Safe Trajectory Sampling in Model-based Reinforcement Learning


Model-based reinforcement learning aims to learn a policy to solve a target task by leveraging a learned dynamics model. This approach, paired with principled handling of uncertainty, e.g. by means of Gaussian processes, allows for data-efficient policy learning in robotics. However, often the physical environment has feasibility and safety constraints that need to be incorporated into reinforcement learning based modeling, to be applicable on a real robot. In this work, we study how to enforce the aforementioned constraints in the context of model-based reinforcement learning with probabilistic dynamics models for robot trajectory generation fulfilling safety and feasibility constraints. In particular, we investigate how trajectories sampled from the learned dynamics model can be used on a real robot, while fulfilling the safety requirements. We present a model-based reinforcement learning approach using Gaussian processes where safety constraints are taken into account without simplifying Gaussian assumptions on the predictive state distributions. We evaluate the proposed approach on different continuous control tasks with varying complexity and demonstrate how our safe trajectory-sampling approach can be directly used on a real robot without violating safety constraints.

Proceedings of the International Conference on Automation Science and Engineering (CASE)