Date of Award


Degree Name

Doctor of Philosophy


Geological and Environmental Sciences

First Advisor

Mohamed Sultan, Ph.D.

Second Advisor

Matt Reeves, Ph.D.

Third Advisor

Racha El Kadiri, Ph.D.


Data mining, data science, harmful algal bloom, machine learning, prediction


In the last few decades, harmful algal blooms (HABs, also known as “red tides”) have become one of the most detrimental natural phenomena all around the world especially in Florida’s coastal areas due to local environmental factors and global warming in a larger scale. Karenia brevis produces toxins that have harmful effects on humans, fisheries, and ecosystems. In this study, I developed and compared the efficiency of state-of-the-art machine learning models (e.g., XGBoost, Random Forest, and Support Vector Machine) in predicting the occurrence of HABs. In the proposed models, the K. brevis abundance is used as the target, and 10 level-02 ocean color products extracted from daily archival MODIS satellite data such as Euphotic Depth (m)and Secchi disk depth, Chlorophyll-a (mg/m3), Diffuse attenuation coefficient (Kd_490;m−1), Sea surface temperature (C°) , Fluorescence line-height, … are used as controlling factors. The adopted approach addresses two main shortcomings of earlier models: (1) the paucity of satellite data due to cloudy scenes and (2) the lag time between the period at which a variable reaches its highest correlation with the target and the time the bloom occurs. Eleven spatio-temporal models were generated, each from three consecutive day satellite datasets, with a forecasting span from one to 11 days. The 3-day models addressed the potential variations in lag time for some of the temporal variables. One or more of the generated 11 models could be used to predict HAB occurrences depending on availability of the cloud-free consecutive days. Findings indicate that XGBoost outperformed the other methods, and the forecasting models of 5–9 days achieved the best results. The most reliable model can forecast eight days ahead of time with balanced overall accuracy, Kappa coefficient, F-Score, and AUC of 96%, 0.93, 0.97, and 0.98 respectively. The euphotic depth, sea surface temperature, and chlorophyll-a are always among the most significant controlling factors. The proposed models could potentially be used to develop an “early warning system” for HABs in southwest Florida.

Access Setting

Dissertation-Open Access