DeepTextGuard: Detecting Machine-Written Tweets Using FastText and Deep Learning

Mohammed Muneeb Uddin Khan, Md Mudassir Qhurashi, Syed Shoaib; Mrs. M Shilpa

Authors

Mohammed Muneeb Uddin Khan, Md Mudassir Qhurashi, Syed Shoaib B.E. Student, Department of IT, Lords Institute of Engineering and Technology, Author
Mrs. M Shilpa Assistant Professor, Department of IT, Lords Institute of Engineering and Technology, Hyderabad Author

Keywords:

Deepfake detection, Deeplearning, Fasttext embedding, Machine-generated tweets, Social Media analysis, Convolutional Neural Network (CNN), Long Short-Term machine-generated (LSTM).

Abstract

The proliferation of deepfake technology has raised concerns about the spread of misinformation on social media platforms. In this paper, we propose a deep learning-based approach for detecting deepfake tweets, specifically those generated by machines, to help mitigate the impact of misinformation online. Our approach leverages Fast Text embeddings to represent tweet text and combines them with deep learning models for classification. We first preprocess the tweet text and then use Fast Text embeddings to convert them into dense vector representations. These embeddings capture semantic information about the tweet content, which is crucial for distinguishing between genuine and machine-generated tweets. We then feed these embeddings into a deep learning model, such as a Convolutional Neural Network (CNN) or a Long Short-Term Memory (LSTM) network, to classify the tweets as genuine or machine-generated. The model is trained on a labelled dataset of tweets, where machine-generated tweets are synthesized using state-of-the-art text generation models. Experimental results on a real-world dataset of tweets demonstrate the effectiveness of our approach in detecting machine-generated tweets. Our approach achieves high accuracy and outperforms existing methods for deepfake detection on social media. Overall, our proposed approach provides a promising solution for identifying machine- generated tweets and combating the spread of misinformation on social media platforms. Simple text manipulation techniques can shape false beliefs, and the impact of powerful transformer models needs to be addressed. The dataset contains tweets from human accounts and various bot accounts using techniques such as RNN, LSTM, Markov, and GPT-2. Moreover, the performance of the proposed method is also compared against other deep learning models such as Long short-term memory (LSTM) and CNN- LSTM displaying the effectiveness and highlighting its advantages in accurately addressing the task at hand. Experimental results indicate that the streamlined design of the CNN architecture, coupled with the utilization of FastText embeddings, allowed for efficient and effective classification of the tweet data with a superior 93% accuracy.

Downloads

Download data is not yet available.

References

J. E. Driskell, E. Salas, J. H. Johnston, and T. N. Wollert, Stress Exposure Training: An Event-Based Approach (Performance Under Stress). London, U.K.: Ashgate, 2008, pp. 271–286.

[2] Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information.*Transactions of the Association for Computational Linguistics,5*,135-146. https://doi.org/10.1162/tacl_a_00051

[3] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. *Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)*, 4171- 4186. https://doi.org/10.18653/v1/N19-1423

[4] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative Adversarial Nets. *Advances in Neural Information Processing Systems, 27*, 2672-2680.

[5] Kumar, M., Rajput, N., Aggarwal, A., Bali, R. K., & Sharma, S. (2021). Detecting AI-Generated Fake News Using Machine Learning. *Journal of Big Data, 8*(1), 1-24. https://doi.org/10.1186/s40537-021-00473-5

[6] Lample, G., Conneau, A., Denoyer, L., & Ranzato, M. (2017). Unsupervised Machine Translation Using Monolingual Corpora Only. *arXiv preprint arXiv:1711.00043*.

[7] Nguyen, T. T., Nguyen, T. N., Nguyen, D. N., & Le, A. C. (2022). Detecting Machine-Generated Text Using Transformer Models. *Proceedings of the 2022 International Conference on Computational Linguistics*, 245-254.

[8] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. *OpenAI Blog, 1*(8), 9.

[9] Schuster, T., Elazar, Y., & Goldberg, Y. (2020). Limitations of Neural Networks for Modeling Human Behavior in Language. *Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)*, 6155- 6168. https://doi.org/10.18653/v1/2020.emnlp-main.498

[10] Shu, K., Wang, S., Lee, D., & Liu, H. (2020). Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements. *Proceedings of the 2020 ACM SIGKDD International Conference on Knowledge Discovery & Data Mining*, 3213-3214.

https://doi.org/10.1145/3394486.340646

DeepTextGuard: Detecting Machine-Written Tweets Using FastText and Deep Learning

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

Submission

Submission

Menu

visitors

Latest publications

Reach US

Ethics and Policies

Important Links

Downloads & Indexing