CKMorph: Central Kurdish Morphological Analyzer

Background Study:

CKMorph: A Comprehensive Morphological Analyzer for Central Kurdish

Cite:

@article{naserzade2023CKMorph,
    title={CKMorph: A Comprehensive Morphological Analyzer for Central Kurdish},
    author={Naserzade, Morteza and Mahmudi, Aso and Veisi, Hadi and Hosseini, Hawre and MohammadAmini, Mohammad},
    journal={International Journal of Digital Humanities},
    year={2023},
    publisher={Springer}
}

Kurdish Morphological analysis Dataset:

Abstract:

A morphological analyzer, a significant component of many natural language processing applications, especially for morphologically rich languages, divides an input word into all its composing morphemes and identifies their morphological roles. This paper introduces a comprehensive morphological analyzer for Central Kurdish (CK), also known as Sorani, a low-resourced language with rich morphology. Building upon the limited existing literature, we first assembled and systematically categorized an extensive collection of the morphological and morphophonological rules of the language. Additionally, we collected and manually labeled a generative lexicon containing nearly 10,000 verb, noun and adjective stems, named entities, and other types of word stems. We used these rule sets and resources to implement CKMorph Analyzer based on finite-state transducers. In order to provide a benchmark for future research, we collected, manually labeled, and publicly shared test sets for evaluating the accuracy and coverage of the analyzer. CKMorph was able to correctly analyze 95.9% of the first test set, containing 1,000 CK words morphologically analyzed according to the context. Moreover, CKMorph gave at least one analysis for 95.5% of 4.22M CK tokens of the second test set. The demonstration of the application and resources, including CK verb database and test sets, are openly accessible at github.com/CKMorph
Keywords: Morphological Analyzer, Central Kurdish, Computational Morphology, Finite-State Transducer, Two-­Level Morphology