Benchmark Of Data Preprocessing Methods For Imbalanced Classification
Abstract
Creating a reliable load forecasting model that
minimizes underpredictions is crucial for avoiding
potential power outages caused by insufficient
electricity generation. However, predicting
residential power consumption is challenging due to
its inherent fluctuations and anomalies. In this study,
we propose several Long Short-Term Memory
(LSTM) frameworks incorporating different
asymmetric loss functions to apply greater penalties
for underpredictions. Additionally, we employ a
density-based spatial clustering of applications with
noise (DBSCAN) technique for anomaly detection
before the load forecasting process to eliminate
outliers. We account for the impacts of weather and
social factors by performing seasonality splitting on
three datasets from France, Germany, and Hungary,
which include hourly power consumption, weather,
and calendar features. Our results, measured by
root-mean-square error (RMSE), demonstrate that
anomaly removal effectively reduces both
underestimation and overestimation errors across all
seasonal datasets. Furthermore, while asymmetric
loss functions and seasonality splitting are successful
in minimizing underestimations, they may slightly
increase the overestimation error. Overall,
addressing underpredictions in electricity
consumption is crucial for preventing power outages
and safeguarding community welfare.