DQN vs REINFORCE
Implemented REINFORCE as an intro to policy optimzation methods.
This project was my introduction to policy gradient methods. As a final project for my intro to reinforcment learning class I chose to compare the performance of deep q-learning with a basic policy gradient method: the REINFORCE algorithm. In my setup (pytorch, lunar lander) DQN consistantly converged with less samples. In a future project I would like to implement more modern policy gradient methods like PPO, and also implement distributional DQN (QR-DQN).
A full detailed write up of this project along with code will be available in the future.