讀書寫作: Dynamic Programming and Reinforcement Learning (動態規劃和強化學習)

6/27/2026

Dynamic Programming and Reinforcement Learning (動態規劃和強化學習)

Course objective: This course introduces dynamic decision-making under uncertainty, with an emphasis on dynamic programming and reinforcement learning. Drawing on applications in business and engineering, students will learn key theories and algorithms for solving multi-stage decision problems, both with and without explicit models of the environment. Assignments and a final project provide practical experience in problem formulation, algorithm evaluation, and Python-based implementation.

Prerequisite subjects (先修科目):

(required) Linear algebra, probability, (Python) programming, any undergraduate optimization course
(helpful) Machine learning
Please ensure that you are comfortable with the (introductory) material in math and Python, which are free once you log in.
Some good (and free) courses on machine learning: Andrew Ng et al.
Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, Mathematics for Machine Learning, Cambridge University Press, 2020.

Tentative evaluation method:

Homework 25%: You are welcome to discuss with your classmates, but please write your own solutions without using LLM tools. For all homeworks, please upload your personal version to CYCU ilearning before each deadline.
Midterm 20%
Final project (1-2 persons per group) 35% (Form for you to fill in)
Class preview questions, attendance, and class participation 20%
David P. Woodruff, 15-451/651: Algorithm Design and Analysis, CMU, Spring 2025.

For all problems, whether SOLO or GROUP, you must not attempt to find the solution online, in a book, in a journal, by asking an AI tool, or searching anywhere else not explicitly permitted.

Final project timeline:

Please refer to my file "DPRL final project.pdf" for some ideas and papers.
May 8 (4 points): Submit the question after your group meet with the instructor to discuss project ideas in class. Novelty: Your project should propose something new (either a new application, method, or perspective).
May 15 (3 points): Submit the list of papers you are reading (submit a one-page summary as a conference paper format to explain your choice to school ilearning)
May 29 (5 points): Each group will give a short presentation in class about a paper related to their project.
June 12 (3 points): PowerPoint review during the class meeting time (Around 1 minute per page). Please upload your files before the class (ppt and pdf, group name-topic, e.g. 1-Federated-learning).
June 26 (10 points): Formal presentations (15-20 min). Please upload your files before the class (ppt and pdf).
July 3 (10 points): Reports to school ilearning (inclduing modified presentation file, written report as a conference paper format, code) (3-10 pages)

Main references:

Emma Brunskill, CS234 Reinforcement Learning, Stanford University (2024, videos)
DeepMind x UCL, Deep Learning Lecture Series 2021 (slides, videos)
Amir-massoud Farahmand, Introduction to Reinforcement Learning, Polytechnique Montréal (videos)
Sergey Levine, CS285 Deep Reinforcement Learning, UC Berkeley, 2023 (with videos), 2026
Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, Second Edition, 2018. (ACM Turing Award winners in 2024) (Code, slides)

Flipped classroom: Please listen to the videos described in my note under my Google Drive folder and look at my note for further explanation before the class. In the class, you will present the lecture and we will discuss the subtle points and present/work on the homework problems.
Each assignment will include selected exercises from the document "DPRL-Homework.pdf" under my Google Drive folder. Please be sure to download the most recent version before starting your homework, as some questions may be revised.
(Tentative) Schedule:

Introduction

Dwarkesh Patel, Richard Sutton – Father of RL thinks LLMs are a dead end, 2025/9/27

Tabular MDP Planning (Homework 1, due 3/18, 11:59pm: Questions under Lecture 1 to 3)
Examples in optimization, inventory, and scheduling
Policy Evaluation
Q-learning and Function Approximation (Homework 2, due 3/18, 11:59pm: Questions under Lecture 1 to 3)
Policy Search 1
Policy Search 2
Policy Search 3
Midterm
Offline RL 1
Offline RL 2
Exploration 1
Exploration 2
Exploration 3
Multi-Agent Game Playing
Final Project
Transfer Learning & Meta-Learning
Multi-Agent Reinforcement Learning