Date of Award


Degree Name

Doctor of Philosophy


Computer Science

First Advisor

Steve Carr, Ph.D.

Second Advisor

Wassnaa Al-Mawee, Ph.D.

Third Advisor

Ikhlas Abdel-Qader, Ph.D.

Fourth Advisor

Shameek Bhattacharjee, Ph.D.


Anti-phishing techniques, counterfeit websites, Kaiser-Meyer-Olkin, machine learning, passive/active multilayer anti phishing approach, webpage traffic behavior


Phishing is the starting point of most cyberattacks, mainly categorized as Email, Websites, Social Networks, Phone calls (Vishing), and SMS messaging (Smishing). Phishing refers to an attempt to collect sensitive data, typically in the form of usernames, passwords, credit card numbers, bank account information, etc., or other crucial facts, intending to use or sell the information obtained. Similar to how a fisherman uses bait to catch a fish, an attacker will pose as a trustworthy source to attract and deceive the victim.

This study explores the efficacy of host-side APT (Anti-Phishing Techniques) based onWebsite features like Lexical, Host-Based, or Content-Based features to identify a clonedWebsite, whether Malicious or Benign, to combat Website Phishing attacks. Unfortunately, host-side APT based on signature statistical methods are passive and have limitations; as a result, a well-crafted Website can bypass them. These passive APT rely on loose ends, like misspellings or strange fonts, etc., to detect a malicious and benign Website which is ineffective because attackers can use advanced tools, such as Website cloning software, to create counterfeit Websites that are virtually indistinguishable from legitimate sites. Furthermore, exclusively relying on aggregators’ information can be problematic as they may quickly become outdated. Finally, using too many attack indicators or features can lead to more false positives and be resource-intensive, as managing numerous feeds can be challenging.

Despite recent advancements in Host-Based APT based on Website features to combat counterfeit Websites, significant challenges regarding the passive nature of APT based on Website features still need to be addressed for this technology to mature. In order to address these challenges, we propose a Host-Based multi-layer Anti-Phishing-Website solution.

The first layer in our proposed multi-layer solution is a passive APT as L1 model, which mainly focuses on detecting (D) and classifying (C)Webpages (W) as Benign (B) or Malicious (M) based on their Lexical (L), Host-Based (H), Content (C), or a combination of (LHC) features - (DC model based on W – LHC).

The second layer in our proposed multi-layer solution to address the limitations regarding the passive nature of L1 model is an active APT as L2 model – Detection (D), Prevention (P), and Classification (C) (DPC) model based on Benign (B) Webpage (W) Traffic Behavior (TB) – (DPC model based on B – WTB). L2 model classifies L1 model Benign Webpages as either Benign- Benign (BB), Benign-Suspicious (BS), or Benign-Malicious (BM)Webpages based on their traffic behavior to access restricted resources.

L2 model is an active network detection and response approach deployed to detect suspicious Website activities on networks, using a combination of machine learning (ML), advanced analytic and rule-based detection. L2 model solution is able to provide anomaly/threat detection, by continuously analyzing raw traffic and/or flow records to build models that reflect normal Website network behavior.

L2 model achieves 90.07%, 91.85%, and 92.62% accuracy, using KNN, LR, and SVM machine learning algorithms. In addition, the implementation of the proposed L2 model shows a significant observation regarding classified Webpages’ attempt to access restricted resources based on their maximum number of access violation attempts for each of the restricted resources and an accumulative number of access attempts over time for each violation access attempts on the restricted resources.

In summary, the proposed Host-Based multi-layer Anti-Phishing-Website solution addresses the limitations of passive APT models by incorporating an active APT approach based on realtime Website interaction and behavior analysis. The combination of both models provides a more comprehensive approach to combating counterfeit Websites.

Access Setting

Dissertation-Open Access