An Approach Based on Semantic Similarity in Persian Query-Based Summarization

Authors

1 Master's degree, Malik Ashtar University of Technology

2 PhD student, Malik Ashtar University of Technology

3 Associate Professor, Malik Ashtar University of Technology

Abstract

Automatic text summarization systems are one type of management systems of huge information. This
paper discusses one type of Persian text summarization based on a query named "an extractive text
summarization" which is very useful for leaders to review information about special topics.
The most important phase in this type of summarization is calculation of the similarity between the query
phrase and components of the original text. For this purpose, after preprocessing the phase, converting the
query to a sentence, and clarifying the word sense, it is possible to calculate the similarity between the query
phrase and sentences using Farsnet. Then, those sentences that are the most similar to those in the query are
selected to be used in the summary. The results of the proposed method show that this method results in quite
acceptable success. Since Persian is very young in processing the original language, this paper and all alike
can be a great help to its result improvement.

Keywords


[1] Esmaeilpour, R., ―The Review of automatic summarization tools
Documents in Various Language for using in Persian Texts
Summarization,‖ Proposal in Supreme Council of Information
and Communication Technology, Computer Engineering
College, Iran University of Science and Technolog. (in Persian)
[2] R. M. Aliguliyev, "A new sentence similarity measure and
sentence based extractive technique for automatic text
summarization," Expert Systems with Applications, vol. 36, pp.
7764-7772, 2009.
[3] H. Luhn, "The Automatic Creation of Literature Abstracts.
Advances in Automatic Text Summarization," ed: MIT Press,
Cambridge, Massachusetts, USA, 1956.
[4] D. Das and A. F. Martins, "A survey on automatic text
summarization," Literature Survey for the Language and
Statistics II course at CMU, vol. 4, pp. 192-195, 2007.
[5] I. Mani and E. Bloedorn" ,Machine learning of generic and userfocused
summarization," in AAAI/IAAI, 1998, pp. 821-826.
[6] W. T. Chuang and J. Yang, "Extracting sentence segments for
text summarization: a machine learning approach," in
Proceedings of the 23rd annual international ACM SIGIR
conference on Research and development in information
retrieval, 2000, pp. 152-159.
[7] A. Mohamed and S. Rajasekaran, "Query-Based Summarization
Based on Document Graphs," in IEEE International Symposium
on Signal Processing and Information Technology, Vancouver,
Canada, 2006, pp. 408-410.
[8] J. Jagadeesh, et al., "Capturing Sentence Prior for Query-Based
Multi-Document Summarization," 2007.
[9] G. A. Miller, "WordNet: a lexical database for english "
presented at the Comm. ACM, 1995.
[10] M. G. Ahsaee, et al., "Semantic similarity assessment of words
using weighted WordNet," International Journal of Machine
Learning and Cybernetics, vol. 5, pp. 479-490, 2014.
[11] Y. Matsuo, et al., "Graph-based word clustering using a web
search engine ",in Proceedings of the 2006 Conference on
Empirical Methods in Natural Language Processing, 2006, pp.
542-550.
[12] M. Sahami and T. D. Heilman, "A web-based kernel function for
measuring the similarity of short text snippets," in Proceedings
of the 15th international conference on World Wide Web, 2006,
pp. 377-386.
[13] H.-H. Chen, et al., "Novel association measures using web
search with double checking," in Proceedings of the 21st
International Conference on Computational Linguistics and the
44th annual meeting of the Association for Computational
Linguistics, 2006, pp. 1009-1016.
[14] R. L. C. a. P. M. B. Vit´anyi, "The Google Similarity Distance,"
presented at the TRANSACTIONS ON KNOWLEDGE AND
DATA ENGINEERING, 2007.
[15] D. Bollegala, et al., "Measuring semantic similarity between
words using web search engines," www, vol. 7, pp. 757-766,
2007.
[16] Y. Liu and Q. Liu, " Sentence Similarity Computation Based on
Feature Set," in 13th International Conference on Computer
Supported Cooperative Work in Design, 2009.
[17] T. K. Landauer, et al., "Introduction to Latent Semantic
Analysis," in Discourse, 1998, pp. 259-284.
[18] J. Allan, et al., "Retrieval and novelty detection at the sentence
level," in SIGIR’03, 2003, pp. 314-321.
[19] Y. Li, et al., "Sentence Similarity Based on Semantic Nets and
Corpus Statistics," presented at the TRANSACTIONS ON
KNOWLEDGE AND DATA ENGINEERING, 2006.
[20] S. Jian-fang, et al., "Sentence Similarity Measure Based on
Events and ContentWords," 2010.
[21] Y. Liu and Y. Liang, "A Sentence Semantic Similarity
Calculating Method Based on Segmented Semantic-
Comparision," Journal of Theoretical and Applied Information
Technology, vol. 48, pp. 231-235, 2013.
[22] J. Xu and Q. Lu, "PolyUCOMP-CORE TYPED: Computing
Semantic Textual Similarity using Overlapped Senses," Atlanta,
Georgia, USA, p. 90, 2013.
[23] M.Hassel and N.Mazdak, "FarsiSum : A Persian Text
Summarizer," presented at the 20th International Conference on
Computational Linguistic, 2004.
[24] Karimi, Z. and Shamsfard, M., ―the automatic summarization
system of Persian texts,‖ 12th international conference of
computer society of Iran , Tehran, 1385. (in Persian)
[25] Akbarzadeh, S. and Teshnehlab, ―Text Summarization based on
Extraction using human cognitive Approach,‖ 18th Iranian
Conference on Electrical Engineering, Isfahan University of
Technology, 1389. (in Persian)
[26] M. Shamsfard, et al., "Persian Document Summarization by
Parsumist," World Applied Sciences Journal, vol. 7, pp. 199-
205, 2009.
[27] H. Shakeri, et al., "A New Graph-Based Algorithm for Persian
Text Summarization," in Computer Science and Convergence,
ed: Springer, 2012, pp. 21-30.
[28] M. Shamsfard, et al., "Semi automatic development of farsnet;
the persian wordnet," in Proceedings of 5th Global WordNet
Conference, Mumbai, India, 2010.
[29] Z.Sepehrian, ―Persian text Summarization based on Query‖,
Malek Ashtar University of Tehran , Computer and Information
Technology Department, Tehran, 1392. (in Persian)
[30] M. S. Rasooli, et al., "A syntactic valency lexicon for Persian
verbs: The first steps towards Persian dependency treebank," in
5th Language & Technology Conference (LTC): Human
Language Technologies as a Challenge for Computer Science
and Linguistics, 2011, pp. 227-231.
[31] S. Tasharofi, et al., "Evaluation of statistical part of speech
tagging of Persian text," in Signal Processing and Its
Applications, 2007. ISSPA 2007. 9th International Symposium
on, 2007, pp. 1-4.
[32] R. NAVIGLI, "Word Sense Disambiguation: A Survey," ACM
Computing Surveys, vol. 41, February 2009.
[33] Z. Karimi and M. Shamsfard" ,Summarization of Persian texts,"
in Proceedings of 11th International CSI computer Conference,
Tehran, Iran, 2006.
Volume 2, Issue 3 - Serial Number 3
February 2020
Pages 51-63
  • Receive Date: 28 April 2014
  • Revise Date: 04 July 2023
  • Accept Date: 19 September 2018
  • Publish Date: 22 November 2014