Document Type : Original Article
Authors
1
Assistant Professor, Department of Computer Engineering, Faculty of Engineering, Kosar University of Bojnord, Bojnord, Iran
2
Assistant Professor, Department of Electrical Engineering, Faculty of Electrical and Computer Engineering, Esfarayen University of Technology, Esfarayen, Iran
Abstract
Cyber security information is rapidly growing on the internet and cyber attacks are increasing daily. Attackers mostly target the military, government, and corporate departments, because these contain sensitive and classified information that requires appropriate defense strategies. Cyber threat information extraction, i.e., extracting entities, relationships between them, and events in cyber texts, is one of the important steps for detecting cyber attacks, harmful events, and mitigating them in real time if they occur. Extracting valuable information from cyber threats can help security professionals to make informed decisions and develop strong defense strategies. It is also a fundamental solution for improving the performance of systems such as text summarization, machine translation, and question-answering. Although information extraction has been an active research topic over the past four decades, its accuracy is still not acceptable and there is no accurate computational model for it. In this paper, first, the entities in the text are extracted with high accuracy using the latest vocabulary embedding method, the Bi-GRU bidirectional recurrent network, the attention mechanism, and the knowledge representation; Then, expressions related to the entities are recognized by calculating the importance and weight of each feature and considering all the necessary criteria in decision-making. The entities relationships were extracted by a graph-based neural network and a heuristic loss function. The KVP deep network based on the attention mechanism has been used for accurate detection and security events prediction which can identify the correlation between two elements that have different positions in the input sequence. Extensive simulations have been carried out to check the performance of the proposed method. According to the simulation results, the proposed method has achieved 89.8% and 93.4% F1 scores on CoNLL-2012 and OSINT datasets, respectively.
Keywords
Main Subjects