Show simple item record

dc.date.accessioned 2016-09-21T03:58:18Z
dc.date.available 2016-09-21T03:58:18Z
dc.date.issued 2016-09-21T03:58:18Z
dc.identifier.uri http://dr.lib.sjp.ac.lk/handle/123456789/2856
dc.description.abstract Attached en_US
dc.description.abstract The advancement of information technology in the modern world has contributed towards enhancing the quality of life of people around the world. To keep pace with this rapid development, it is important to have links with the enormous network called the Internet. This enables people to have access to information resources, keep abreast of news, send timely E-mail, and have interactive remote conferences. However, a tremendous task facing information consumers today is to identify relevant news items speedily. Hence, designing CUl information filter for users of Internet news bulletins, is a dire need of the dav. The main thrust of this study is, therefore, focused on designing an algoritlun to identify news items available on the Internet and categorizing them according to their degree of similarity to each other. The main concept exploited in obtaining a metric for computing the degree of similarity of two news items is ,b,;I;e-d: on-, calculating and .J.' f.. <~'&::'.- :' x, ".c· comparing the percentage of proper nouns c0l1'l1JJ9nto both news items. - • :!i.~~ .••••• ""'...'-' In order to extract proper nouns from a news ite~:'a filtering process is employed to eliminate pronouns, articles, prepositions, "Be verbs" , determiners and other function words. Subsequently the frequencies in which the extracted words have occurred in the news item are calculated and analyzed. Statistical methods are used to confirm the above result.s before they are presented to the user. The proposed algorithm was tested and favourable results were obtained by using the news items downloaded from the LAcNet Sri lankan news archives available on the Internet. Values of degree of similarity obtained for the test data was compared with human classification. Based on these results, it is demonstrated that the proposed algorithm is able to categorize news items into two classes, 'similar' and 'different', successfully. This achievement makes a significant contribution t.owards achieving the aut.omatic categorization of news items available on the Internet into various topics.
dc.language.iso en en_US
dc.title none en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account