Analysis Crime Data Using Python
Abstract
This project focuses on the analysis and prediction of
crime trends across various states and union territories in
India using machine learning techniques. The dataset
comprises crime-related statistics categorized by state,
district, and year. Initial data preprocessing steps include
handling missing values and removing duplicates to
ensure data quality. Exploratory Data Analysis (EDA) is
conducted through various visualizations to highlight
crime patterns, identify states with high and low crime
rates, and observe temporal trends in Indian Penal Code
(IPC) crimes.A machine learning model using Random
Forest Regressor is trained to predict the total number of
IPC crimes based on state, district, and year as input
features. Label encoding is used to convert categorical
variables into numeric format suitable for model training.
The model’s performance is evaluated using the Rsquared
metric, and predictions are visualized to compare
actual versus forecasted crime numbers.Furthermore, a
user interface component is incorporated, allowing users
to input a specific state, district, and year to receive a
crime forecast along with a safety classification (e.g.,
"Safest City", "Medium Safe City", or "Not Safe City").
This application can serve as a decision-support tool for
policymakers and law enforcement agencies to proactively
address crime trends.
Downloads
References
National Crime Records Bureau (NCRB), India
Crime in India Reports.
Available at: https://ncrb.gov.in/en/crime-india
(Used for crime data collection and analysis
framework)
a. Scikit-learn Documentation
Scikit-learn: Machine Learning in Python.
Available at: https://scikit-learn.org/stable/
(Used for Random Forest Regressor,
LabelEncoder, model evaluation, and data
preprocessing)
b. Pandas Documentation
Pandas: Python Data Analysis Library.
Available at: https://pandas.pydata.org/
(Used for data manipulation and analysis)
c. NumPy Documentation
NumPy: The fundamental package for
scientific computing with Python.
Available at: https://numpy.org/doc/
(Used for numerical operations and data
handling)
d. Matplotlib & Seaborn
i. Hunter, J.D. (2007). Matplotlib:
A 2D graphics environment.
Computing in Science &
Engineering.
ii. Waskom, M.L. (2021). Seaborn:
statistical data visualization.
Journal of Open Source
Software.
(Used for data visualization and
exploratory data analysis)
e. Joblib Library
Joblib: Tools for lightweight pipelining in
Python.
Available at: https://joblib.readthedocs.io/
(Used for saving and loading the trained
machine learning model)
f. Tkinter GUI Documentation
Tkinter: Python’s standard GUI package.
Available at:
https://docs.python.org/3/library/tkinter.html
(Used for basic GUI elements in the CLI
input system)
g. Kaggle Crime Datasets (if applicable)
Example: Crime in India (NCRB) – Public
dataset on Kaggle.
Available at: https://www.kaggle.com/
(Alternative or supplemental dataset used
for training or validation)
h. Bishop, C. M. (2006)
Pattern Recognition and Machine Learning,
Springer.
(Reference for machine learning principles
and model evaluation)
i. James, G., Witten, D., Hastie, T., &
Tibshirani, R. (2013)
An Introduction to Statistical Learning,
Springer.
(Used to understand regression models and
evaluation techniques)