The hardware challenges associated with light-field(LF) imaging has made it difficult for consumers to access its benefits like applications in novel view rendering, post-capture focus, and aperture control. Learning-based techniques which solve the ill-posed problem of LF reconstruction from sparse (1, 2, or 4) views have significantly reduced the requirement for complex hardware. LF video reconstruction from sparse views poses a special challenge as acquiring ground truth for training these models is hard. In this talk, I discuss my work on a self-supervised learning-based algorithm for LF video reconstruction from monocular videos. I utilize self-supervised geometric, photometric, and temporal consistency constraints for guiding LF video reconstruction. Additionally, I discuss three key techniques that are relevant to our monocular video input. I develop an explicit disocclusion handling technique that encourages the network to inpaint disoccluded regions in the LF frame, using information from adjacent input temporal frames. I also implement an adaptive low-rank representation that provides a significant boost in performance by tailoring the representation to each input scene. Finally, I train a novel refinement block that is able to exploit the available LF image data using supervised learning to further refine the reconstruction quality.