Lunar Lander Reinforcement Learning

GOAL:

Learning and hoping to contribute to general purpose AI one day

TECHNOLOGY:

Python, PyTorch, OpenAI Gym, Keras, Tensorflow, Scikit, Jupyter, Google Colab, Docker

COMPANY:

Personal Project

DESCRIPTION:

Learning from Open AI contributions and breakthrough research papers, I explored state of the art model-free algorithms like Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO), Twin Delayed DDPG (TD3), Soft Actor Critic (SAC), Deep Q Network (DQN), Advantage Actor Critic (Async + Sync) (A2C/A3C) and model-based algoirthms like the famous Alpha Zero Monte Carlo Search Tree (MCST). I explored each algorithm, their benefits, use cases and implementations. Archived Github upon request.