In this project which was a part of my masters thesis I have implemented a reinforcement learning algorithm for model trains to find the optimal speed for each segment of the tracks. After the learning is complete trains can use learned policy to take best actions based on which track segment they are in. This will help the trains to reach their destination in minimum time and we later hard code the learned policy to to our path planing algorithm.
The second part of the study focuses on comparing the performance of the RL algorithm to human learning. We studied how long it takes for the humans to learn the same polices as RL algorithm and how their best results compare to each other.
Link to my thesis:
Link to published paper in AAAI conference: