Deep Reinforcement Learning, while providing some impressive results (e.g. on Atari, Go, etc.), is notoriously data inefficient. This is partially due to the function approximators used (deep networks) but also due to the weak learning signal (based on observing rewards). In this talk we will discuss the role of transfer learning to help making DRL more data efficient. In particular I will focus on how different formulation of KL-regularized RL can provide more systematic exploration of the environment and hence a more reliable learning signal. If time allows, we will quickly cover three related recent works https://arxiv.org/abs/1707.04175, https://arxiv.org/abs/1806.01780 and recent work on information asymmetry in KL-regularized settings.