Recent advances in Deep Learning have enabled to train deep architectures efficiently. However, they can be further improved upon by using intelligent optimization schemes. A major advance in training any large scale machine learning model is the use of Stochastic Gradient Descent (SGD). There are many drawbacks with SGD such as decreasing step-size, noisy gradients etc. This has led to a line of work on improving upon SGD and has resulted in many efficient algorithms such as Adagrad, RMSProp and Adam. In this talk we aim to introduce you to these schemes and provide insights on designing such algorithms.