Machine Learning based Method for Insurance Fraud Detection on Class Imbalance Datasets with Missing Values
DOI:
https://doi.org/10.63665/340bc990Keywords:
Insurance fraud detection, Machine learning, Class imbalance dataset, Missing values, Data preprocessing, Fraud classification, Precision, RecallAbstract
Machine Learning based insurance fraud detection plays an important role in identifying fraudulent insurance claims and reducing financial losses. In this project, machine learning techniques were implemented to detect fraudulent and genuine insurance claims on class imbalance datasets containing missing values. The dataset was preprocessed using data cleaning, missing value handling, normalization, and balancing techniques to improve model performance and reliability. Different machine learning algorithms were trained and evaluated to classify claims accurately despite the imbalance in data distribution.
The proposed system focuses on improving fraud detection accuracy while handling practical challenges such as incomplete records and highly imbalanced datasets. Performance evaluation was carried out using metrics such as accuracy, precision, recall, and F1-score to ensure reliable classification results. The experimental results demonstrate that machine learning models can effectively identify suspicious insurance claims and support insurance companies in minimizing fraud-related losses. Future enhancements may include ensemble learning, deep learning approaches, and real-time fraud detection systems for improved performance and scalability.
Downloads
References
[1] A. A. Khalil, Z. Liu, and A. A. Ali, “Using an adaptive network‐based fuzzy inference system model to predict the loss ratio of petroleum insurance in Egypt,” Risk Management and Insurance Review, vol. 25, no. 1, pp. 5–18, 2022, doi: 10.1111/rmir.12200.
[2] C. Bockel-Rickermann, T. Verdonck, and W. Verbeke, “Fraud analytics: A decade of research: Organizing challenges and solutions in the field,” Expert Systems with Applications, vol. 232, p. 120605, 2023, doi: https://doi.org/10.1016/j.eswa.2023.120605.
[3] Y. Wang and W. Xu, “Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud,” Decision Support Systems, vol. 105, pp. 87–95, 2018, doi: https://doi.org/10.1016/j.dss.2017.11.001.
[4] B. Itri, Y. Mohamed, Q. Mohammed, and B. Omar, “Performance comparative study of machine learning algorithms for automobile insurance fraud detection,” in 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), 2019, pp. 1–4, doi: 10.1109/ICDS47004.2019.8942277.
[5] R. P. B. Piovezan, P. P. de Andrade Junior, and S. L. Ávila, “Machine Learning Method for Return Direction Forecast of Exchange Traded Funds (ETFs) Using Classification and Regression Models,” Computational Economics, 2023, doi: 10.1007/s10614-023-10385-4.
[6] A. A. Khalil, Z. Liu, A. Salah, A. Fathalla, and A. Ali, “Predicting Insolvency of Insurance Companies in Egyptian Market Using Bagging and Boosting Ensemble Techniques,” IEEE Access, vol. 10, pp. 117304–117314, 2022, doi: 10.1109/ACCESS.2022.3210032.
[7] N. Boodhun and M. Jayabalan, “Risk prediction in life insurance industry using supervised learning algorithms,” Complex & Intelligent Systems, vol. 4, no. 2, pp. 145–154, 2018, doi: 10.1007/s40747-018-0072-1.
[8] D. Tiwari, B. Nagpal, B. S. Bhati, A. Mishra, and M. Kumar, “A systematic review of social network sentiment analysis with comparative study of ensemble-based techniques,” Artificial Intelligence Review, vol. 56, no. 11, pp. 13407–13461, 2023, doi: 10.1007/s10462-023-10472-w.
[9] M. Liao, S. Tian, Y. Zhang, G. Hua, W. Zou, and X. Li, “PDA: Progressive Domain Adaptation for Semantic Segmentation,” Knowledge-Based Systems, vol. 284, p. 111179, 2024, doi: https://doi.org/10.1016/j.knosys.2023.111179.
[10] A. Khalil, Z. Liu, and A. Ali, “Precision in Insurance Forecasting: Enhancing Potential with Ensemble and Combination Models based on the Adaptive Neuro Fuzzy Inference System in the Egyptian Insurance Industry,” Applied Artificial Intelligence, vol. 38, no. 1, p. 2348413, 2024, doi: 10.1080/08839514.2024.2348413.
[11] A. K. I. Hassan and A. Abraham, “Modeling insurance fraud detection using ensemble combining classification,” International Journal of Computer Information Systems and Industrial Management Applications, vol. 8, pp. 257–265, 2016.
[12] V. R. Shetty and R. L. Malghan, “Safeguarding against Cyber Threats: Machine Learning-Based Approaches for Real-Time Fraud Detection and Prevention,” Engineering Proceedings, vol. 59, no. 1, p. 111, 2023.
[13] A. R. Khalid, N. Owoh, O. Uthmani, M. Ashawa, J. Osamor, and J. Adejoh, “Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach,” Big Data and Cognitive Computing, vol. 8, no. 1, p. 6, 2024.
[14] A. A. Khalil, Z. Liu, and A. Ali, “Enhancing operational efficiency of insurance companies: a fuzzy time series approach to loss ratio forecasting in the Egyptian market,” Journal of Business Analytics, pp. 1–19, 2024, doi: 10.1080/2573234X.2024.2393609.
[15] M. Hanafy and R. Ming, “Improving imbalanced data classification in auto insurance by the data level approaches,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, 2021.
[16] B. Baesens, S. Höppner, I. Ortner, and T. Verdonck, “robROSE: A robust approach for dealing with imbalanced data in fraud detection,” Statistical Methods & Applications, vol. 30, no. 3, pp. 841–861, 2021, doi: 10.1007/s10260-021-00573-7.
[17] S. Subudhi and S. Panigrahi, “Effect of Class Imbalanceness in Detecting Automobile Insurance Fraud,” in 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA), 2018, pp. 528–531, doi: 10.1109/ICDSBA.2018.00104.
[18] T. Olalekan Yusuf and A. Rasheed Babalola, “Control of insurance fraud in Nigeria: an exploratory study (case study),” Journal of Financial Crime, vol. 16, no. 4, pp. 418–435, 2009, doi: 10.1108/13590790910993744.
[19] R. Bhowmik, “Detecting auto insurance fraud by data mining techniques,” Journal of Emerging Trends in Computing and Information Sciences, vol. 2, no. 4, pp. 156–162, 2011.
[20] K. Nian, H. Zhang, A. Tayal, T. Coleman, and Y. Li, “Auto insurance fraud detection using unsupervised spectral ranking for anomaly,” The Journal of Finance and Data Science, vol. 2, no. 1, pp. 58–75, 2016, doi: https://doi.org/10.1016/j.jfds.2016.03.001.
[21] G. Kowshalya and M. Nandhini, “Predicting Fraudulent Claims in Automobile Insurance,” in 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), 2018, pp. 1338–1343, doi: 10.1109/ICICCT.2018.8473034.
[22] L. Goleiji and M. Tarokh, “Identification of influential features and fraud detection in the Insurance Industry using the data mining techniques (Case study: automobile’s body insurance),” Majlesi Journal of Multimedia Processing, vol. 4, pp. 1–5, 2015.
[23] S. Goundar, S. Prakash, P. Sadal, and A. Bhardwaj, “Health Insurance Claim Prediction Using Artificial Neural Networks,” International Journal of System Dynamics Applications (IJSDA), vol. 9, no. 3, pp. 40–57, 2020.
[24] J. Debener, V. Heinke, and J. Kriebel, “Detecting insurance fraud using supervised and unsupervised machine learning,” Journal of Risk and Insurance, vol. 90, no. 3, pp. 743–768, 2023,
doi: https://doi.org/10.1111/jori.12427.
[25] A. Urunkar, A. Khot, R. Bhat, and N. Mudegol, “Fraud Detection and Analysis for Insurance Claim using Machine Learning,” in 2022 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), 2022, pp. 406–411, doi: 10.1109/SPICES52834.2022.9774071.
[26] Y. Abakarim, M. Lahby, and A. Attioui, “A Bagged Ensemble Convolutional Neural Networks Approach to Recognize Insurance Claim Frauds,” Applied System Innovation, vol. 6, no. 1, 2023, doi: 10.3390/asi6010020.
[27] B. Xu, Y. Wang, X. Liao, and K. Wang, “Efficient fraud detection using deep boosting decision trees,” Decision Support Systems, vol. 175, p. 114037, 2023, doi: https://doi.org/10.1016/j.dss.2023.114037.
[28] S. Subudhi and S. Panigrahi, “Use of optimized Fuzzy C-Means clustering and supervised classifiers for automobile insurance fraud detection,” Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 5, pp. 568–575, 2020,
doi: https://doi.org/10.1016/j.jksuci.2017.09.010.
[29] A. Jadhav, D. Pramod, and K. Ramanathan, “Comparison of Performance of Data Imputation Methods for Numeric Dataset,” Applied Artificial Intelligence, vol. 33, no. 10, pp. 913–933, 2019,
doi: 10.1080/08839514.2019.1637138.
[30] G. G. Sundarkumar, V. Ravi, and V. Siddeshwar, “One-class support vector machine based undersampling: Application to churn prediction and insurance fraud detection,” in 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), 2015, pp. 1–7,
doi: 10.1109/ICCIC.2015.7435726.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
