dc.contributor.author |
Sarveswaran, Kengatharaiyer |
|
dc.contributor.author |
Mahesan, Sinnathamby |
|
dc.date.accessioned |
2016-10-25T07:03:08Z |
|
dc.date.available |
2016-10-25T07:03:08Z |
|
dc.date.issued |
2016-10-25T07:03:08Z |
|
dc.identifier.citation |
Sarveswaran, K., & Mahesan, S. (2014). Hierarchical Tag-set for Rule-based Processing of Tamil Language. International Journal of Multidisciplinary Studies (IJMS), 1(2), 67-74. |
|
dc.identifier.uri |
http://dr.lib.sjp.ac.lk/handle/123456789/3307 |
|
dc.description.abstract |
Corpora are fundamental tools for Natural Language Processing. Part of Speech tagging provides more
meaning to the corpora by annotating words. A tag-set used to annotate a corpus should be selected in such a
way that it represents grammatical structure of the respective language. These tag-sets can be flat or
hierarchical in structure. There are several efforts have been made in Tamil language to identify a tag-set.
However, existing tag-sets have many shortcomings including inability of tagging all the words, inability to
capture required syntactic information such as divisibility, too many numbers of tags in a set, flat in tag
structure, and lack of extendibility. The scholar works Tolkāppiyam and Naṉṉūl clearly shows the grammatical
classification of words. This paper proposes a new hierarchical tag-set with 10 labels for Tamil language in
view of developing a morphological analyser by considering the existing limitations and using Tamil grammar.
The morphological analyser can be used to extend the proposed tag-set easily with more grammatical
information. |
en_US |
dc.language.iso |
en |
en_US |
dc.subject |
POS tagging |
en_US |
dc.subject |
Tag-set |
en_US |
dc.subject |
Morphological analyser |
en_US |
dc.subject |
Tamil grammar |
en_US |
dc.title |
Hierarchical Tag-set for Rule-based Processing of Tamil Language |
en_US |
dc.type |
Article |
en_US |
dc.date.published |
2014 |
|