Date of Award


Degree Name

Doctor of Philosophy


Electrical and Computer Engineering

First Advisor

Janos L. Grantner, Ph.D.

Second Advisor

Ikhlas Abdel-Qader, Ph.D.

Third Advisor

Saad Shebrain, M.D.


Despite the advantages of minimally invasive surgeries that depend heavily on vision, the indirect access and lack of the 3D field of view of the area of interest introduce some complications in the desired procedures. Fortunately, the recorded videos from these procedures offer the opportunity for intra-operative and post-operative analyses, to improve future performance and safety.

Deep learning models for surgical video analysis could therefore support visual tasks such as identifying the critical view of safety (CVS) in laparoscopic cholecystectomy (LC), potentially contributing to the reduction of the current rates of bile duct injuries in LC. Most bile duct injuries during LC occur due to visual misperception leading to the misinterpretation of anatomy.

In this study, a deep neural network is proposed comprising a segmentation model to highlight hepatocytic anatomy to predict CVS criteria achievement during LC. This network was trained and tested using 200 LC videos, which include the cholec80, m2cai16-tool, and m2cai16-workflow challenges datasets, the World Laparoscopy Hospital videos, and the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) videos. Expert surgeons preprocessed and reviewed these videos to train the proposed model. The model takes advantage of the effectiveness of Auto-Encoder weights as starting weights of the U-Net encoder to explore accurate medical image segmentation. Semantic segmentation is more appropriate than the bounding boxes method, which may produce incorrect or overlapping annotations of structures.

U-Net has been chosen because its network architecture is very simple, and straightforward, and it can be trained fast. In addition, U-Net creates highly detailed segmentation maps using very few samples, and that is a very important aspect for the medical imaging community because the number of available labeled images is often quite low. These properties make U-Net a suitable choice.

Five experiments were developed to prove the efficiency of the proposed model, where each experiment was trained with different weights and data preprocessing approaches. Each was then tested on a part of the dataset. Five cross-validation techniques were deployed for each experiment to estimate the performance of unseen data, and to ensure that the model can be generalized. A hybrid loss function was used to calculate the loss of the output results from different levels, and other evaluation metrics were calculated to evaluate the model’s performance as a segmentation model.

The efficiency of the proposed approach outperforms some of the state-of-the-art studies reported in the literature for automatic hepatocytic landmarks identification with an accuracy of 92%, 93.9% for precision, and 74.7% for mean IoU, despite the limited number of videos available for this study. In addition, the proposed model has proven its efficiency in segmenting challenging cases.

Moreover, the model can segment dynamic LC videos and generate two views of the segmented video. This work can analyze dynamic videos assuming that it would facilitate CVS identification.

Finally, 1,550 laparoscopic cholecystectomy image frames from 200 video clips were selected and annotated at the pixel level for 5 classes confirmed by expert surgeons, which are identified as CVS in laparoscopic cholecystectomy surgery.

Access Setting

Dissertation-Open Access

Movie 3.mp4 (100385 kB)

Included in

Biomedical Commons