Learning from demonstrations: Applications to Autonomous UAV Landing and Minecraft
Published:
There was an article a few years ago on how an AI model was trained on > 11,000 years’ worth of DOTA gameplay to beat pro players. Why do machines require so many samples for learning? If we were to build AI models for equally complex, everyday tasks like a self-driving car for instance, how many samples would we need to learn? As it turns out, the answer is “a lot.” As robots and other intelligent agents move from simple environments to more complex, unstructured settings, modern deep reinforcement learning (RL) systems require an exponentially increasing number of samples for learning. For many real-world tasks where environment samples and compute are expensive, sample-efficiency is a practical bottleneck.
Conventional AI models learn from conditioning- where behavior is encouraged via a reward or discouraged via punishment. Think of spraying water on your dog when he bites the couch or treating him for a job well done. Humans are better learners than AI for one major reason: we not only learn from conditioning but also learn from observation. You may have seen videos of cute babies trying to imitate their parents do things, or animals trying to mimic their owner’s actions. Cognitive Science and behavioral studies show that humans and animals fundamentally learn behavior through imitation, thanks to the brilliant coordination between our eyes and the brain. This isn’t just “monkey see, monkey do,” we internalize factors such as context, which results in the development of intuition.
For my Thesis, I taught ML models to fly drones and perform tasks in a video game (both in simulation) by using human demonstrations of two real-world systems: Unmanned Aerial Vehicles (UAVs) and Minecraft. The challenge for learning these behaviors was two-fold: (i) they are complex systems with hierarchical tasks, and (ii) the goals are sparsely rewarded, so it is hard to incentivize learning. Here, I will talk about IL's application to UAVs. Details on the second application of my thesis can be found on the Projects page here.
Slides for this application can be accessed here. Codebase for the first half of this work can be found here. A short video of training the ML model is available here. Performance metrics were visualized with Weights & Biases, and the reports are available here. I also made a short 3-minute video as part of a University competition, available here. For the more patient folks, my thesis is also publicly available here.
Conventional AI models learn from conditioning- where behavior is encouraged via a reward or discouraged via punishment. Think of spraying water on your dog when he bites the couch or treating him for a job well done. Humans are better learners than AI for one major reason: we not only learn from conditioning but also learn from observation. You may have seen videos of cute babies trying to imitate their parents do things, or animals trying to mimic their owner’s actions. Cognitive Science and behavioral studies show that humans and animals fundamentally learn behavior through imitation, thanks to the brilliant coordination between our eyes and the brain. This isn’t just “monkey see, monkey do,” we internalize factors such as context, which results in the development of intuition.
For my Thesis, I taught ML models to fly drones and perform tasks in a video game (both in simulation) by using human demonstrations of two real-world systems: Unmanned Aerial Vehicles (UAVs) and Minecraft. The challenge for learning these behaviors was two-fold: (i) they are complex systems with hierarchical tasks, and (ii) the goals are sparsely rewarded, so it is hard to incentivize learning. Here, I will talk about IL's application to UAVs. Details on the second application of my thesis can be found on the Projects page here.
Slides for this application can be accessed here. Codebase for the first half of this work can be found here. A short video of training the ML model is available here. Performance metrics were visualized with Weights & Biases, and the reports are available here. I also made a short 3-minute video as part of a University competition, available here. For the more patient folks, my thesis is also publicly available here.