Lunar Lander Reinforcement Learning



Learning from Open AI contributions and breakthrough research papers, I explored state of the art model-free algorithms like Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO), Twin Delayed DDPG (TD3), Soft Actor Critic (SAC), Deep Q Network (DQN), Advantage Actor Critic (Async + Sync) (A2C/A3C) and model-based algoirthms like the famous Alpha Zero Monte Carlo Search Tree (MCST). I explored each algorithm, their benefits, use cases and implementations. Archived Github upon request.