Water Quality Prediction Using Machine Learning
Abstract
The primary objective of this project is to employ
machine learning methods for assessing water
quality using a numerical measure known as
potability. Several key parameters—ph, Hardness,
Solids, Chloromines, Sulfate, Conductivity, Organic
Carbon, Trihalomethanes, and Turbidity—were
utilized as a feature vector to evaluate overall water
quality. The study utilized two classification
algorithms, Decision Tree (DT) and K-Nearest
Neighbor (KNN), to predict water quality classes.
Experiments were conducted using both real data
from various locations in Andhra Pradesh and
synthetic datasets generated randomly based on
these parameters. Results indicated that the KNN
classifier performed better than other models in
predicting potability. Data normalization and
feature selection are done to construct the dataset to
develop machine learning models. Machine
learning algorithms such as linear regression, MLP
regressor, support vector regressor and random
forest has been employed to build a water quality
prediction model. Support vector machines (SVM),
naïve bayes, decision trees, MLP classifiers, have
been used to develop a classification model for
classifying water quality index. The findings
underscore the efficacy of machine learning