Crime data Analysis using juypter
Abstract
This project focuses on the analysis and prediction of
crime trends across various states and union territories in
India using machine learning techniques. The dataset
comprises crime-related statistics categorized by state,
district, and year. Initial data preprocessing steps include
handling missing values and removing duplicates to
ensure data quality. Exploratory Data Analysis (EDA) is
conducted through various visualizations to highlight
crime patterns, identify states with high and low crime
rates, and observe temporal trends in Indian Penal Code
(IPC) crimes.A machine learning model using Random
Forest Regressor is trained to predict the total number of
IPC crimes based on state, district, and year as input
features. Label encoding is used to convert categorical
variables into numeric format suitable for model training.
The model’s performance is evaluated using the Rsquared
metric, and predictions are visualized to compare
actual versus forecasted crime numbers.Furthermore, a
user interface component is incorporated, allowing users
to input a specific state, district, and year to receive a
crime forecast along with a safety classification (e.g.,
"Safest City", "Medium Safe City", or "Not Safe City").
This application can serve as a decision-support tool for
policymakers and law enforcement agencies to proactively
address crime trends.
Downloads
References
National Crime Records Bureau (NCRB), India
Crime in India Reports.
Available at: https://ncrb.gov.in/en/crime-india
(Used for crime data collection and analysis framework)
2. Scikit-learn Documentation
Scikit-learn: Machine Learning in Python.
Available at: https://scikit-learn.org/stable/
(Used for Random Forest Regressor, LabelEncoder, model
evaluation, and data preprocessing)
3. Pandas Documentation Pandas: Python Data Analysis Library.
Available at: https://pandas.pydata.org/
(Used for data manipulation and analysis)
4. NumPy Documentation NumPy: The fundamental package for
scientific computing with Python.
Available at: https://numpy.org/doc/
(Used for numerical operations and data handling)
5. Matplotlib & SeabornHunter, J.D. (2007). Matplotlib: A 2D
graphics environment. Computing in Science &
Engineering.Waskom, M.L. (2021). Seaborn: statistical data
visualization. Journal of Open Source Software.
(Used for data visualization and exploratory data analysis)
6. Joblib Library Joblib: Tools for lightweight pipelining in Python.
Available at: https://joblib.readthedocs.io/
(Used for saving and loading the trained machine learning model)
7. Tkinter GUI Documentation Tkinter: Python’s standard GUI
package.
Available at: https://docs.python.org/3/library/tkinter.html
(Used for basic GUI elements in the CLI input system)
8. Kaggle Crime Datasets (if applicable) Example: Crime in India
(NCRB) – Public dataset on Kaggle. Available at:
https://www.kaggle.com/ (Alternative or supplemental dataset
used for training or validation)
9. Bishop, C. M. (2006) Pattern Recognition and Machine Learning,
Springer. (Reference for machine learning principles and model
evaluation)
10. James, G., Witten, D., Hastie, T., & Tibshirani, R.
(2013) An Introduction to Statistical Learning, Springer. (Used to
understand regression models and evaluation techniques)