We present a method that uses the visual information in Cholecystectomy procedure’s video to detect the surgical workflow. While related work rely on rich external information, we rely only on the video used in the surgery. We fine-tune a DNN network and use it to get mid-level features representing the surgical phases. In addition, we train DPM object detectors to detect the surgical tools and use it to provide discriminative high-level features. The mid- and high-level visual features are used to train one-vs-all SVMs followed by an HMM to infer the surgical workflow. Experiments are provided on a relatively large dataset and we reach 80% detection accuracy using only visual information.