Detection Of Phishing Websites Using Machine Learning
Abstract
Phishing attacks are a rapidly
expanding threat in the cyber world, costing internet
users billions of dollars each year. It is a criminal
crime that involves the use of a variety of social
engineering tactics to obtain sensitive information
from users. Phishing techniques can be detected
using a variety of types of communication, including
email, instant chats, pop-up messages, and web
pages. This study develops and creates a model that
can predict whether a URL link is legitimate or
phishing.
The data set used for the classification was sourced
from an opensource service called ‘Phish Tank’
which contain phishing URLs in multiple formats
such as CSV, JSON, etc. and also from the
University of New Brunswick dataset bank which
has a collection of benign, spam, phishing, malware
& defacement URLs. Over six (6) machine learning
models and deep neural network algorithms all
together are used to detect phishing URLs.
This study aims to develop a web application
software that detects phishing URLs from the
collection of over 5,000 URLs which are randomly
picked respectively and are fragmented into 80,000
training samples & 20,000 testing samples, which
are equally divided between phishing and legitimate
URLs. The URL dataset is trained and tested base on
some feature selection such as address bar-based
features, domain-based features, and HTML &
JavaScript-based features to identify legitimate and
phishing URLs.
In conclusion, the study provided a model for URL
classification into phishing and legitimate URLs.
This would be very valuable in assisting individuals
and companies in identifying phishing attacks by
authenticating any link supplied to them to prove its
validity