Learning from demonstrations: Applications to Autonomous UAV Landing and Minecraft
Published:
There was an article a few years ago on how an AI model was trained on > 11,000 years’ worth of DOTA gameplay to beat pro players. Why do machines require so many samples for learning? If we were to build AI models for equally complex, everyday tasks like a self-driving car for instance, how many samples would we need to learn? As it turns out, the answer is “a lot.” As robots and other intelligent agents move from simple environments to more complex, unstructured settings, modern deep reinforcement learning (RL) systems require an exponentially increasing number of samples for learning. For many real-world tasks where environment samples and compute are expensive, sample-efficiency is a practical bottleneck.
Conventional AI models learn from conditioning- where behavior is encouraged via a reward or discouraged via punishment. Think of spraying water on your dog when he bites the couch or treating him for a job well done. Humans are better learners than AI for one major reason: we not only learn from conditioning but also learn from observation. You may have seen videos of cute babies trying to imitate their parents do things, or animals trying to mimic their owner’s actions. Cognitive Science and behavioral studies show that humans and animals fundamentally learn behavior through imitation, thanks to the brilliant coordination between our eyes and the brain. This isn’t just “monkey see, monkey do,” we internalize factors such as context, which results in the development of intuition.
For my Thesis, I taught ML models to fly drones and perform tasks in a video game (both in simulation) by using human demonstrations of two real-world systems: Unmanned Aerial Vehicles (UAVs) and Minecraft. The challenge for learning these behaviors was two-fold: (i) they are complex systems with hierarchical tasks, and (ii) the goals are sparsely rewarded, so it is hard to incentivize learning. Here, I will talk about IL's application to UAVs. Details on the second application of my thesis can be found on the Projects page here.
Slides for this application can be accessed here. Codebase for the first half of this work can be found here. A short video of training the ML model is available here. Performance metrics were visualized with Weights & Biases, and the reports are available here. I also made a short 3-minute video as part of a University competition, available here. For the more patient folks, my thesis is also publicly available here.
Conventional AI models learn from conditioning- where behavior is encouraged via a reward or discouraged via punishment. Think of spraying water on your dog when he bites the couch or treating him for a job well done. Humans are better learners than AI for one major reason: we not only learn from conditioning but also learn from observation. You may have seen videos of cute babies trying to imitate their parents do things, or animals trying to mimic their owner’s actions. Cognitive Science and behavioral studies show that humans and animals fundamentally learn behavior through imitation, thanks to the brilliant coordination between our eyes and the brain. This isn’t just “monkey see, monkey do,” we internalize factors such as context, which results in the development of intuition.
Interestingly, many real-world problems have readily available human demonstration data for performing the task (self-driving cars, playing Chess, robots for day-to-day tasks, flying helicopters, etc.). In this data-centric era, (i) can we leverage available demo data to reduce our sample requirements? and (ii) can conventional AI models learn context and capture intuition by referring to data of a human performing tasks? Machine Learning (ML) techniques such as imitation learning do exactly this, while utilizing as few demo samples as possible. This 'sample-efficient' method of leveraging human demos to generate the desired behavior has proven to be effective in problems with singular tasks, especially on tasks related to robotic manipulation and autonomous vehicles.
For my Thesis, I taught ML models to fly drones and perform tasks in a video game (both in simulation) by using human demonstrations of two real-world systems: Unmanned Aerial Vehicles (UAVs) and Minecraft. The challenge for learning these behaviors was two-fold: (i) they are complex systems with hierarchical tasks, and (ii) the goals are sparsely rewarded, so it is hard to incentivize learning. Here, I will talk about IL's application to UAVs. Details on the second application of my thesis can be found on the Projects page here.
Current control systems do not capture the intuition and decision-making skills behind a pilot's maneuvers, which can be crucial for landing under uncertainties. For this, I helped design a novel method of autonomous UAV landing using only human demos and a visual cue for positioning. Specifically, I applied imitation learning algorithms on a trained pilot's point-to-point maneuvers of a drone in Microsoft AirSim, in order to learn an ML model that imitates the pilot's behavior. I demonstrated sample-efficient imitation by learning from as few as 10 pilot demonstrations of the task and comment on the need for 'smooth' experts for learning smooth landings. To summarize, an ML model was designed to capture a pilot’s intuition to navigate and land drones on a simulated ship deck. The learned ML model achieved a high imitation accuracy, demonstrating the sample efficiency of imitation learning methods.
Slides for this application can be accessed here. Codebase for the first half of this work can be found here. A short video of training the ML model is available here. Performance metrics were visualized with Weights & Biases, and the reports are available here. I also made a short 3-minute video as part of a University competition, available here. For the more patient folks, my thesis is also publicly available here.