Many visual computing tasks involve reasoning over structured domains and discrete objects which can be modeled as combinatorial optimization (CO) problems. Tremendous speed-up in general-purpose CO solvers has allowed to tackle these problems in many cases despite being NP-hard. Approaching large-scale structured prediction problems from a combinatorial optimization standpoint however, has been a challenge due to a variety of reasons. These include near real-time performance requirements and lack of differentiability of CO approaches. The latter causes difficulties in harnessing machine learning to specify problem specifications and in developing learnable CO algorithms. We aim to address these shortcomings on multiple avenues. (i) We focus on a specific CO problem for clustering known as the multicut problem with many applications in a variety of visual computing tasks. We aim towards reducing human effort in multicut model specification by utilizing neural networks and address scalability issues by devising parallel algorithms. (ii) Next we shift our focus to general-purpose CO solvers and tackle challenges related to their scalability. As a first step, we devise parallel algorithms that can harness GPUs gaining up to an order of magnitude speed-up over sequential CPU counterparts. For the second step, we exploit machine learning in solving relaxations of CO problems. Given a few problem instances from a task of interest, our general-purpose solver can be adapted by learning instead of task-specific solver development. Empirically our learned approach achieves significantly better performance than its non-learned version, better solutions than task-specific solvers, and exhibits better anytime performance compared to a commercial solver (Gurobi). In summary, we study methods to make CO for visual computing more practical by devising differentiable, massively parallel, and data-driven methods.