Analysis Crime Data Using Python

Authors

  • Pathan Mahaboobi PG scholar, Department of MCA, CDNR collage, Bhimavaram, Andhra Pradesh. Author
  • A.Naga Raju (Assistant Professor), Master of Computer Applications, DNR collage, Bhimavaram, Andhra Pradesh. Author

Abstract

This project focuses on the analysis and prediction of
crime trends across various states and union territories in
India using machine learning techniques. The dataset
comprises crime-related statistics categorized by state,
district, and year. Initial data preprocessing steps include
handling missing values and removing duplicates to
ensure data quality. Exploratory Data Analysis (EDA) is
conducted through various visualizations to highlight
crime patterns, identify states with high and low crime
rates, and observe temporal trends in Indian Penal Code
(IPC) crimes.A machine learning model using Random
Forest Regressor is trained to predict the total number of
IPC crimes based on state, district, and year as input
features. Label encoding is used to convert categorical
variables into numeric format suitable for model training.
The model’s performance is evaluated using the Rsquared
metric, and predictions are visualized to compare
actual versus forecasted crime numbers.Furthermore, a
user interface component is incorporated, allowing users
to input a specific state, district, and year to receive a
crime forecast along with a safety classification (e.g.,
"Safest City", "Medium Safe City", or "Not Safe City").
This application can serve as a decision-support tool for
policymakers and law enforcement agencies to proactively
address crime trends.

Downloads

Download data is not yet available.

References

National Crime Records Bureau (NCRB), India

Crime in India Reports.

Available at: https://ncrb.gov.in/en/crime-india

(Used for crime data collection and analysis

framework)

a. Scikit-learn Documentation

Scikit-learn: Machine Learning in Python.

Available at: https://scikit-learn.org/stable/

(Used for Random Forest Regressor,

LabelEncoder, model evaluation, and data

preprocessing)

b. Pandas Documentation

Pandas: Python Data Analysis Library.

Available at: https://pandas.pydata.org/

(Used for data manipulation and analysis)

c. NumPy Documentation

NumPy: The fundamental package for

scientific computing with Python.

Available at: https://numpy.org/doc/

(Used for numerical operations and data

handling)

d. Matplotlib & Seaborn

i. Hunter, J.D. (2007). Matplotlib:

A 2D graphics environment.

Computing in Science &

Engineering.

ii. Waskom, M.L. (2021). Seaborn:

statistical data visualization.

Journal of Open Source

Software.

(Used for data visualization and

exploratory data analysis)

e. Joblib Library

Joblib: Tools for lightweight pipelining in

Python.

Available at: https://joblib.readthedocs.io/

(Used for saving and loading the trained

machine learning model)

f. Tkinter GUI Documentation

Tkinter: Python’s standard GUI package.

Available at:

https://docs.python.org/3/library/tkinter.html

(Used for basic GUI elements in the CLI

input system)

g. Kaggle Crime Datasets (if applicable)

Example: Crime in India (NCRB) – Public

dataset on Kaggle.

Available at: https://www.kaggle.com/

(Alternative or supplemental dataset used

for training or validation)

h. Bishop, C. M. (2006)

Pattern Recognition and Machine Learning,

Springer.

(Reference for machine learning principles

and model evaluation)

i. James, G., Witten, D., Hastie, T., &

Tibshirani, R. (2013)

An Introduction to Statistical Learning,

Springer.

(Used to understand regression models and

evaluation techniques)

Downloads

Published

2025-05-01

Issue

Section

Articles

How to Cite

Analysis Crime Data Using Python. (2025). International Journal of Multidisciplinary Engineering In Current Research, 10(5), 138-141. https://ijmec.com/index.php/multidisciplinary/article/view/630

Most read articles by the same author(s)