روشی جدید در خلاصه‌سازی متون فارسی بر اساس عبارت پرس‌وجوی کاربر

نویسندگان

1 کارشناسی ارشد، دانشگاه صنعتی مالک‌ اشتر

2 دانشجوی دکترا، دانشگاه صنعتی مالک‌ اشتر

3 دانشیار، دانشگاه صنعتی مالک‌ اشتر

چکیده

سیستم‌های خلاصه‌سازی خودکار متون یکی از انواع سیستم‌های مدیریت اطلاعات حجیم هستند. این مقاله به یکی از شکل‌های خلاصه‌سازی‌های استخراجی به نام خلاصه‌سازی مبتنی بر پرس‌وجوی کاربر بر روی زبان فارسی می‌پردازد. مهم‌ترین فاز در این نوع خلاصه‌سازی محاسبه شباهت بین عبارت پرس‌وجو و اجزای متن اصلی است. در اینجا برای رسیدن به این مهم، بعد از طی کردن فاز پیش‌پردازش، تبدیل عبارت پرس‌وجو به جمله و بهره بردن از ابهام زدایی معنایی کلمات، به محاسبه شباهت معنایی بین عبارت پرس‌وجو و جملات متن پرداخته می‌شود و جملاتی که بیشترین شباهت معنایی را با عبارت پرس‌وجو داشته باشند برای حضور در خلاصه انتخاب می‌شوند. ارزیابی‌های حاصل از رویکرد پیشنهادی پایان‌نامه نشان از مطلوب بودن نسبی الگوریتم موردنظر دارد. با توجه به نوپا بودن زبان فارسی در زمینه پردازش زبان طبیعی، توسعه آنچه در این مقاله بررسی شده‌است و نظایر آن می‌تواند به بهبود اوضاع، کمک شایانی کند.

کلیدواژه‌ها


عنوان مقاله [English]

An Approach Based on Semantic Similarity in Persian Query-Based Summarization

نویسندگان [English]

  • Zahra Sepehrian 1
  • Saeideh Sadat Sadid Pour 2
  • Hassan Shirazi 3
1 Master's degree, Malik Ashtar University of Technology
2 PhD student, Malik Ashtar University of Technology
3 Associate Professor, Malik Ashtar University of Technology
چکیده [English]

Automatic text summarization systems are one type of management systems of huge information. This
paper discusses one type of Persian text summarization based on a query named "an extractive text
summarization" which is very useful for leaders to review information about special topics.
The most important phase in this type of summarization is calculation of the similarity between the query
phrase and components of the original text. For this purpose, after preprocessing the phase, converting the
query to a sentence, and clarifying the word sense, it is possible to calculate the similarity between the query
phrase and sentences using Farsnet. Then, those sentences that are the most similar to those in the query are
selected to be used in the summary. The results of the proposed method show that this method results in quite
acceptable success. Since Persian is very young in processing the original language, this paper and all alike
can be a great help to its result improvement.

کلیدواژه‌ها [English]

  • Query-Based Suummarization
  • Query
  • Semantic Similarity
  • Word Sense Disambiguation
  • Farsnet
[1] Esmaeilpour, R., ―The Review of automatic summarization tools
Documents in Various Language for using in Persian Texts
Summarization,‖ Proposal in Supreme Council of Information
and Communication Technology, Computer Engineering
College, Iran University of Science and Technolog. (in Persian)
[2] R. M. Aliguliyev, "A new sentence similarity measure and
sentence based extractive technique for automatic text
summarization," Expert Systems with Applications, vol. 36, pp.
7764-7772, 2009.
[3] H. Luhn, "The Automatic Creation of Literature Abstracts.
Advances in Automatic Text Summarization," ed: MIT Press,
Cambridge, Massachusetts, USA, 1956.
[4] D. Das and A. F. Martins, "A survey on automatic text
summarization," Literature Survey for the Language and
Statistics II course at CMU, vol. 4, pp. 192-195, 2007.
[5] I. Mani and E. Bloedorn" ,Machine learning of generic and userfocused
summarization," in AAAI/IAAI, 1998, pp. 821-826.
[6] W. T. Chuang and J. Yang, "Extracting sentence segments for
text summarization: a machine learning approach," in
Proceedings of the 23rd annual international ACM SIGIR
conference on Research and development in information
retrieval, 2000, pp. 152-159.
[7] A. Mohamed and S. Rajasekaran, "Query-Based Summarization
Based on Document Graphs," in IEEE International Symposium
on Signal Processing and Information Technology, Vancouver,
Canada, 2006, pp. 408-410.
[8] J. Jagadeesh, et al., "Capturing Sentence Prior for Query-Based
Multi-Document Summarization," 2007.
[9] G. A. Miller, "WordNet: a lexical database for english "
presented at the Comm. ACM, 1995.
[10] M. G. Ahsaee, et al., "Semantic similarity assessment of words
using weighted WordNet," International Journal of Machine
Learning and Cybernetics, vol. 5, pp. 479-490, 2014.
[11] Y. Matsuo, et al., "Graph-based word clustering using a web
search engine ",in Proceedings of the 2006 Conference on
Empirical Methods in Natural Language Processing, 2006, pp.
542-550.
[12] M. Sahami and T. D. Heilman, "A web-based kernel function for
measuring the similarity of short text snippets," in Proceedings
of the 15th international conference on World Wide Web, 2006,
pp. 377-386.
[13] H.-H. Chen, et al., "Novel association measures using web
search with double checking," in Proceedings of the 21st
International Conference on Computational Linguistics and the
44th annual meeting of the Association for Computational
Linguistics, 2006, pp. 1009-1016.
[14] R. L. C. a. P. M. B. Vit´anyi, "The Google Similarity Distance,"
presented at the TRANSACTIONS ON KNOWLEDGE AND
DATA ENGINEERING, 2007.
[15] D. Bollegala, et al., "Measuring semantic similarity between
words using web search engines," www, vol. 7, pp. 757-766,
2007.
[16] Y. Liu and Q. Liu, " Sentence Similarity Computation Based on
Feature Set," in 13th International Conference on Computer
Supported Cooperative Work in Design, 2009.
[17] T. K. Landauer, et al., "Introduction to Latent Semantic
Analysis," in Discourse, 1998, pp. 259-284.
[18] J. Allan, et al., "Retrieval and novelty detection at the sentence
level," in SIGIR’03, 2003, pp. 314-321.
[19] Y. Li, et al., "Sentence Similarity Based on Semantic Nets and
Corpus Statistics," presented at the TRANSACTIONS ON
KNOWLEDGE AND DATA ENGINEERING, 2006.
[20] S. Jian-fang, et al., "Sentence Similarity Measure Based on
Events and ContentWords," 2010.
[21] Y. Liu and Y. Liang, "A Sentence Semantic Similarity
Calculating Method Based on Segmented Semantic-
Comparision," Journal of Theoretical and Applied Information
Technology, vol. 48, pp. 231-235, 2013.
[22] J. Xu and Q. Lu, "PolyUCOMP-CORE TYPED: Computing
Semantic Textual Similarity using Overlapped Senses," Atlanta,
Georgia, USA, p. 90, 2013.
[23] M.Hassel and N.Mazdak, "FarsiSum : A Persian Text
Summarizer," presented at the 20th International Conference on
Computational Linguistic, 2004.
[24] Karimi, Z. and Shamsfard, M., ―the automatic summarization
system of Persian texts,‖ 12th international conference of
computer society of Iran , Tehran, 1385. (in Persian)
[25] Akbarzadeh, S. and Teshnehlab, ―Text Summarization based on
Extraction using human cognitive Approach,‖ 18th Iranian
Conference on Electrical Engineering, Isfahan University of
Technology, 1389. (in Persian)
[26] M. Shamsfard, et al., "Persian Document Summarization by
Parsumist," World Applied Sciences Journal, vol. 7, pp. 199-
205, 2009.
[27] H. Shakeri, et al., "A New Graph-Based Algorithm for Persian
Text Summarization," in Computer Science and Convergence,
ed: Springer, 2012, pp. 21-30.
[28] M. Shamsfard, et al., "Semi automatic development of farsnet;
the persian wordnet," in Proceedings of 5th Global WordNet
Conference, Mumbai, India, 2010.
[29] Z.Sepehrian, ―Persian text Summarization based on Query‖,
Malek Ashtar University of Tehran , Computer and Information
Technology Department, Tehran, 1392. (in Persian)
[30] M. S. Rasooli, et al., "A syntactic valency lexicon for Persian
verbs: The first steps towards Persian dependency treebank," in
5th Language & Technology Conference (LTC): Human
Language Technologies as a Challenge for Computer Science
and Linguistics, 2011, pp. 227-231.
[31] S. Tasharofi, et al., "Evaluation of statistical part of speech
tagging of Persian text," in Signal Processing and Its
Applications, 2007. ISSPA 2007. 9th International Symposium
on, 2007, pp. 1-4.
[32] R. NAVIGLI, "Word Sense Disambiguation: A Survey," ACM
Computing Surveys, vol. 41, February 2009.
[33] Z. Karimi and M. Shamsfard" ,Summarization of Persian texts,"
in Proceedings of 11th International CSI computer Conference,
Tehran, Iran, 2006.