Detecting Fake Accounts in Social Networks Using Principal Components Analysis and Kernel Density Estimation Algorithm (A Case Study on the Twitter Social Network)

Document Type : Original Article

Author

Instructor, Department of Computer, Islamic Azad University, Ramhormoz Branch, Ramhormoz, Iran

Abstract

The use of social networks is growing increasingly and people spend a lot of their time using these
networks. Celebrities and companies have used these networks to connect with their fans and customers and
news agencies use these networks to publish news. In line with the growing popularity of online social
networks, security risks and threats are also increasing, and malicious activities and attacks such as
phishing, creating fake accounts and spam on these networks have increased significantly. In a fake account
attack, malicious users introduce themselves instead of other people by creating a fake account and in this
way, they abuse the reputation of individuals or companies. This paper presents a new method for detecting
fake accounts in social networks based on machine learning algorithms. The proposed method for machine
training uses Various similarity features such as Cosine similarity, Jaccard similarity, friendship network
similarity, and centrality measures. All these features are extracted from the graph adjacency matrix of the
social network. Then, principal component analysis was used in order to reduce the data dimensions and
solve the problem of overfitting. The data are then classified using the Kernel Density Estimation
classification and the Self Organization map and the results of the proposed method are evaluated using the
measure of accuracy, sensitivity, and false-positive rate. Examination of the results shows that the proposed
method detects fake accounts with 99.6% accuracy which is about 5% better than Cao's method. The rate of
misdiagnosis of fake accounts also improved by 3% compared to the same method.

Keywords


[1]           D. Kagan, Y. Elovichi, and M. Fire, “Generic anomalous vertices detection utilizing a link prediction algorithm,” Social Network Analysis and Mining, vol. 8, no. 1, p. 27, 2018.##
[2]           H. Gao, J. Hu, T. Huang, J. Wang, and Y. J. I. I. C. Chen, “Security issues in online social networks,” vol. 15, no. 4, pp. 56-63, 2011.##
[3]           L. A. Cutillo, R. Molva, and T. J. I. C. M. Strufe, “Safebook: A privacy-preserving online social network leveraging on real-life trust,” vol. 47, no. 12, pp. 94-101, 2009.##
[4]           K. Sakariyah, A. Nor, B. Anuara, A. Kamsina, K. D. Varathana, and S. A. Razakb, “Malicious accounts: Dark of the social networks,” Journal of Network and Computer Applications, vol. 79, pp. 41-67, 1 February 2017.##
[5]           K. Krombholz, D. Merkl, and E. Weippl, “Fake identities in social media: A case study on the sustainability of the facebook business model,” Journal of Service Science Research, vol. 4, no. 2, pp. 175-212, 2012.##
[6]           H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman, “Sybilguard: defending against sybil attacks via social networks,” in ACM SIGCOMM Computer Communication Review, ACM, vol. 36, no. 4, pp. 267-278, 2006.##
[7]           E. Van Der Walt and J. J. I. A. Eloff, “Using Machine Learning to Detect Fake Identities: Bots vs Humans,” vol. 6, pp. 6540-6549, 2018.##
[8]           V. Subrahmanian et al., “The DARPA Twitter bot challenge,” 2016.##
[9]           M. Fire, R. Goldschmidt, Y. J. I. C. S. Elovici, and Tutorials, “Online social networks: threats and solutions,” vol. 16, no. 4, pp. 2019-2036, 2014.##
[10]         J. L. Becker and H. Chen, “Measuring privacy risk in online social networks,” 2009.##
[11]         S. Jagadish and J. Parikh, “Discovery of friends using social network graph properties,” ed: Google Patents, 2014.##
[12]         M. Cha, A. Mislove, and K. P. Gummadi, “A measurement-driven analysis of information propagation in the flickr social network,” in Proceedings of the 18th international conference on World wide web, ACM, pp. 721-730, 2009.##
[13]         S. Wasserman and K. Faust, “Social network analysis: Methods and applications,” Cambridge university press, 1994.##
[14]         J. Scott, “Social network analysis,” Sage, 2017.##
[15]         E. Otte and R. J. J. o. i. S. Rousseau, “Social network analysis: a powerful strategy, also for the information sciences,” vol. 28, no. 6, pp. 441-453, 2002.##
[16]         M. Y. Kharaji and F. S. J. a. p. a. Rizi, “An IAC Approach for Detecting Profile Cloning in Online Social Networks,” 2014.##
[17]         R. Laxhammar, G. Falkman, and E. Sviestins, “Anomaly detection in sea traffic-a comparison of the gaussian mixture model and the kernel density estimator,” in 2009 12th International Conference on Information Fusion, IEEE, pp. 756-763, 2009.##
[18]         J. A. J. S. Barnes, “Graph theory and social networks: A technical comment on connectedness and connectivity,” vol. 3, no. 2, pp. 215-232, 1969.##
[19]         P. J. Carrington, J. Scott, and S. Wasserman, “Models and methods in social network analysis,” Cambridge university press, 2005.##
[20]         S. Jouili, S. Tabbone, and E. Valveny, “Comparing graph similarity measures for graphical recognition,” in International Workshop on Graphics Recognition, Springer, pp. 37-48, 2009.##
[21]         F. Golshahi, A. Toroghi Haghighat, “providing an improved method in social networks to predict links in multilayer networks,” Electronic and Cyber Defense, vol. 8 (2),pp. 15-24, 2020. (In Persian)##
[22]         C. G. Akcora, B. Carminati, E. J. S. N. A. Ferrari, and Mining, “User similarities on social networks,” vol. 3, no. 3, pp. 475-495, 2013.##
[23]         J. Bank and B. J. W. S. T. Cole, “Calculating the jaccard similarity coefficient with map reduce for entity pairs in wikipedia,” pp. 1-18, 2008.##
[24]         L. Dong, Y. Li, H. Yin, H. Le, and M. J. M. P. i. E. Rui, “The algorithm of link prediction on social network,” vol. 2013, 2013.##
[25]         J. Santisteban and J. Tejada-Cárcamo, “Unilateral Jaccard Similarity Coefficient,” in GSB@ SIGIR, pp. 23-27, 2015.##
[26]         H. Seifoddini, M. J. C. Djassemi, and I. Engineering, “The production data-based similarity coefficient versus Jaccard's similarity coefficient,” vol. 21, no. 1-4, pp. 263-266, 1991.##
[27]         S. Niwattanakul, J. Singthongchai, E. Naenudorn, and S. Wanapu, “Using of Jaccard coefficient for keywords similarity,” in Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 1, no. 6,2013.##
[28]         C. A. Bliss, M. R. Frank, C. M. Danforth, and P. S. J. J. o. C. S. Dodds, “An evolutionary algorithm approach to link prediction in dynamic social networks,” vol. 5, no. 5, pp. 750-764, 2014.##
[29]         T. Zhou, L. Lü, and Y.-C. J. T. E. P. J. B. Zhang, “Predicting missing links via local information,” vol. 71, no. 4, pp. 623-630, 2009.##
[30]         Q. Li, Y. Zheng, X. Xie, Y. Chen, W. Liu, and W.-Y. Ma, “Mining user similarity based on location history,” in Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems, ACM, p. 34, 2008.##
[31]         R. J. Bayardo, Y. Ma, and R. Srikant, “Scaling up all pairs similarity search,” in Proceedings of the 16th international conference on World Wide Web, ACM, pp. 131-140, 2007.##
[32]         A. Gionis, P. Indyk, and R. Motwani, “Similarity search in high dimensions via hashing,” in Vldb, vol. 99, no. 6, pp. 518-529, 1999.##
[33]         W. Cukierski, B. Hamner, and B. Yang, “Graph-based features for supervised link prediction,” in Neural Networks (IJCNN), The 2011 International Joint Conference on, IEEE, pp. 1237-1244, 2011.##
[34]         I. T. Jolliffe, “Principal component analysis and factor analysis,” Principal component analysis, pp. 150-166, 2002.##
[35]         M. B. Pouyan and D. Kostka, “Random forest based similarity learning for single cell RNA sequencing data,” Bioinformatics, vol. 34, no. 13,
pp. i79-i88, 2018.##
[36]         E. Parzen, “On estimation of a probability density function and mode,” The annals of mathematical statistics, vol. 33, no. 3, pp. 1065-1076, 1962.##
[37]         J. Kim and C. D. Scott, “Robust kernel density estimation,” The Journal of Machine Learning Research, vol. 13, no. 1, pp. 2529-2565, 2012.##
[38]         J. Cao, Q. Fu, Q. Li, and D. J. I. S. Guo, “Discovering hidden suspicious accounts in online social networks,” vol. 394, pp. 123-140, 2017.##
[39]         S. Gurajala, J. S. White, B. Hudson, B. R. Voter, J. N. J. B. D. Matthews, and Society, “Profile characteristics of fake Twitter accounts,” vol. 3, no. 2, p. 2053951716674236, 2016.##
[40]         G. Wang, W. Jiang, J. Wu, Z. J. I. T. o. P. Xiong, and D. Systems, “Fine-grained feature-based social influence evaluation in online social networks,” vol. 25, no. 9, pp. 2286-2296, 2014.##
[41]         Z. Shan, H. Cao, J. Lv, C. Yan, and A. Liu, “Enhancing and identifying cloning attacks in online social networks,” in Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication, ACM, p. 59, 2013.##
[42]         K. S. Adewole, N. B. Anuar, A. Kamsin, K. D. Varathan, S. A. J. J. o. N. Razak, and C. Applications, “Malicious accounts: dark of the social networks,” vol. 79, pp. 41-67, 2017.##
[43]         M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki, “Link prediction using supervised learning,” in SDM06: workshop on link analysis, counter-terrorism and security, 2006.##
[44]         D. Savage, X. Zhang, X. Yu, P. Chou, and Q. J. S. N. Wang, “Anomaly detection in online social networks,” vol. 39, pp. 62-70, 2014.##
[45]         M. Conti, R. Poovendran, and M. Secchiero, “Fakebook: Detecting fake profiles in on-line social networks,” in Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), IEEE Computer Society,pp. 1071-1078, 2012.##
[46]         Y. Zhang, J. J. S. N. A. Lu, and Mining, “Discover millions of fake followers in Weibo,” vol. 6, no. 1, p. 16, 2016.##
[47]         B. Viswanath, A. Post, K. P. Gummadi, and A. J. A. S. C. C. R. Mislove, “An analysis of social network-based sybil defenses,” vol. 41, no. 4, pp. 363-374, 2011.##
[48]         J. Xue, Z. Yang, X. Yang, X. Wang, L. Chen, and Y. Dai, “Votetrust: Leveraging friend invitation graph to defend against social network sybils,” in INFOCOM, 2013 Proceedings IEEE, pp. 2400-2408, 2013.##
[49]         Q. Cao, M. Sirivianos, X. Yang, and T. Pregueiro, “Aiding the detection of fake accounts in large scale social online services,” in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, USENIX Association, pp. 15-15, 2012.##
[50]         Y. Boshmaf et al., “Integro: Leveraging Victim Prediction for Robust Fake Account Detection in OSNs,” in NDSS, vol. 15, pp. 8-11,2015.##
[51]         L. Jin, H. Takabi, and J. B. Joshi, “Towards active detection of identity clone attacks on online social networks,” in Proceedings of the first ACM conference on Data and application security and privacy, ACM, pp. 27-38, 2011.##
[52]         K. L. Arega, “Social Media Fake Account Detection for Afan Oromo Language using Machine Learning,” 2020.##
[53]         F. C. Akyon and M. E. Kalfaoglu, “Instagram Fake and Automated Account Detection,” in 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), IEEE, pp. 1-7, 2019.##
[54]         M. Egele, G. Stringhini, C. Kruegel, G. J. I. T. o. D. Vigna, and S. Computing, “Towards detecting compromised accounts on social networks,” no. 1, pp. 1-1, 2017.##
[55]         S. Lee and J. J. C. C. Kim, “Early filtering of ephemeral malicious accounts on Twitter,” vol. 54, pp. 48-57, 2014.##
[56]         Z. Yang, C. Wilson, X. Wang, T. Gao, B. Y. Zhao, and Y. J. A. T. o. K. D. f. D. Dai, “Uncovering social network sybils in the wild,” vol. 8, no. 1, p. 2, 2014.##
[57]         M. Singh, D. Bansal, and S. Sofat, “Detecting malicious users in Twitter using classifiers,” in Proceedings of the 7th International Conference on Security of Information and Networks, ACM, p. 247, 2014.##
[58]         K. Gani, H. Hacid, and R. Skraba, “Towards multiple identity detection in social networks,” in Proceedings of the 21st International Conference on World Wide Web,ACM, pp. 503-504, 2012.##
[60]         Y. Bengio and Y. J. J. o. m. l. r. Grandvalet, “No unbiased estimator of the variance of k-fold cross-validation,” vol. 5, no. Sep, pp. 1089-1105, 2004.##
[61]         R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Ijcai, 1995, Montreal, Canada, vol. 14, no. 2, pp. 1137-1145, 1995.##
 
Volume 9, Issue 3 - Serial Number 35
Serial No. 35, Autumn Quarterly
December 2021
Pages 109-123
  • Receive Date: 05 December 2020
  • Revise Date: 16 February 2021
  • Accept Date: 17 February 2021
  • Publish Date: 22 November 2021