Scheduling decisions in software projects are difficult because of frequent design changes and strong feedback between the development tasks. In my master's thesis, I used reinforcement learning to compute cost-optimal scheduling strategies for sample software projects. The computations were based on a Markov decision model of the software process. An analysis of the optimal strategies may lead to practical insights about scheduling real software projects. In my talk, I will sketch the mathematical model and optimization approach, and outline the future research direction.