We formalize this learning paradigm within the framework of reinforcement learning. To overcome the challenges of policy search in non-differentiable program space we introduce a meta-algorithm that is based on mirror descent, program synthesis, and imitation learning. This approach interleaves the use of synthesized symbolic programs to regularize neural learning, with the imitation of gradient-based learning to improve the quality of synthesized programs. This perspective allows us to prove robust expected regret bounds and finite-sample guarantees for this algorithm.
The theoretical results guaranteeing more reliable learning are accompanied by promising empirical results on complex tasks, such as learning autonomous driving agents and generating interpretable programs for behavior annotation. This research program establishes a synergistic relationship between machine learning and program synthesis.
https://cs-uni-saarland-de.zoom.us/j/99773349186?pwd=L0xjbmIybGNwajk2NTNEZHFJVFc2UT09