Journal of Siberian Federal University. Humanities & Social Sciences 2022 15(3): 329-336
DOI: 10.17516/1997-1370-0360 УДК 811.512.157'366
Linguistic Annotation
of Grammatical Categories of Sakha: Nouns
Gavril G. Torotoev* and Sandaara G. Torotoevа
M.K. Ammosov North-Eastern Federal University Yakutsk, Russian Federation
Received 07.11.2018, received in revised form 21.11.2018, accepted 05.12.2018
Abstract. This paper shows the work to create instruments for linguistic annotation of grammatical categories of Sakha language (Sakha language and Yakut language are full synonyms). It describes the basic inflectional characteristics of Nouns of Sakha language (numbers, personal endings, possessive endings, cases), which are based on Leipzig Glossing Rules. As a result of scientific research (2014-2018) the system of tags was developed, which reflects all word forming potential of the Nouns in the Sakha language, including 247 morphological indicators in its arsenal. It should be noted that the standardized system of morphological tagging of Turkic languages, developed by the Turkologists, is far from perfect, there are various treatments concerning reflection and interpretation of grammatical categories in different Turkic languages. Despite this, the article summarizes constructive and progressive ideas of our colleagues on this matter.
Keywords: linguistic annotation, grammatical categories, Sakha, nouns, numbers, possessiveness, simple declension, possessive declension, diminutive, tags.
Research area: philology
Citation: Torotoev, G.G., Torotoeva, S.G. (2022). Linguistic annotation of grammatical categories of Sakha: nouns. J. Sib. Fed. Univ. Humanit. Soc. Sci., 15(3), 329-336. DOI: 10.17516/1997-1370-0360.
© Siberian Federal University. All rights reserved
* Corresponding author E-mail address: torgav@mail.ru
ORCID:
Introduction
Linguistic annotation of grammatical categories of languages is an up-to-date issue in modern computational linguistics. Artificial intelligence opens an opportunity to get innovative results in theoretical linguistics (acquiring new knowledge about language structure), as well as in applied linguistics (modernization of linguistic research methods, implementation of new technologies for automated language processing).
Today, due to the intensive development of computer technologies, there is a need in tagging system for the automatic analysis of electronic corpora of Turkic texts. To improve the effectiveness of comparative studies and acquisition of objective language data as a representative linguistic instrument, it is necessary to apply a standardized morphological tagging system to the corpora of texts in Turkic languages.
A working version of standardized morphological tagging of Turkic languages was accepted in 2014 during UniTurk workshop ("Unification of Grammatical Annotation Systems in the Electronic Corpora of Turkic Languages") in Kazan. The database is built on the morphemic structure of Turkic word forms and is made to reflect structural-semantic model of Turkic languages as precisely as possible. The uniform standard of linguistic information representation opens a unique opportunity for Turkic languages to join the common information space. (Zheltov, 2015: 329).
Problem statement
As a result, it is necessary to develop a tag system which would adequately reflect all grammatical categories of the Yakut language. The work in this area has been in progress for five years to conclude that there are some grammatical categories of the Yakut language that have not been fully reflected in the previous publications. The computer processing of text, which requires complete formalization of knowledge about language and its grammar, reveals some interesting language facts and implicit (hidden) linguistic details, not covered by the classic works by the Yakut scholars.
The computational linguistics researchers have been paying special attention to the inflectional and derivational morphology. Consequently, first it is required to describe and mark all regular inflectional and active derivational indicators of the Yakut language. Secondly, it is necessary to develop rules for the allomorph selection and sandhi rules for automatic word form analysis (morphonological processes in morphemic boundaries; phonetic processes within one word form).
Methods
The study is descriptive. To find the maximum number of inflectional allomorphs of nouns in the Sakha language, the quantitative method was used. As a result of the empirical analysis, nine tables were compiled, forming the basis for the interpretation and reflection of the grammatical categories of nouns in the Sakha language. The research results may be used for filling lacunas in the existing studies of the Sakha Language.
Discussion
For morphological annotation of grammatical categories of the Sakha language the system of tags based on the Leipzig glossing rules is used. Tags indicating parts of speech in the Sakha are presented in Table 1.
Table 1
Tags Full term
N Noun
POSS Possessive
PRO Pronoun
NUM Numeral
ADJ Adjective
V Verb
PCP Participle
CONV Converb
ADV Adverb
MOD Modal word
INTJ Interjection
CONJ Conjunction
PART Particle
POST Postposition
IMIT Imitative word
From the point of view of the Sakha language glossing, in this article the grammatical category of nouns was considered. Such inflectional characteristics of the noun as number, case, possessiveness and personality have been carefully analyzed.
1. Number
In the Yakut language, the plural affix -lar is represented by 16 forms (Korkina, 1982: 125126). In the selection of the optimal allomorph,
the key role is played by the vowel harmony rules of the Yakut language. Phonetic compatibility of morphemes also depends on assimilation rules (progressive, regressive, progressive-regressive assimilation of consonants) and accommodation. Thus, sandhi rules are devel-
oped in accordance with vowel harmony rules, rules of assimilation and accommodation, and demonstrate the sound changes at the morphemic boundaries.
2. Possessiveness
In the Sakha language, the initial form of the possessiveness category is represented by 58 morphological indicators. These forms are frequently used, as they express various logical relations and connections between objects, that
are often different from the concept of possession (Korkina, 1982: 129).
3. Cases in the Yakut language
In the interpretation of grammatical categories and their indication with corre-
Tags Description Allomorphs Morphemes
POSS_1SG Possessive, 1 person, singular ('my') -m -m
POSS_2SG Possessive, 2 person, singular ('your')
POSS_3SG Possessive , 3 person, singular ('his/her/its') after consonants: -a/-o/-e/-ö after vowels: -ta/-to/-te/-tö -A -tA
POSS_1PL Possessive, 1 person, plural ('our') -byt/-bit/-but/-büt -pyt/-pit/-put/-püt -myt/-mit/-mut/-müt -BYT
POSS_2PL Possessive, 2 person, plural ('your') -xyt/-xit/-xut/-xüt -xyt-xit/-xut/-xüt -kyt/-kit/-kut/-küt -gyt/-git/-gut/-güt -gyt-git/-gut/-güt -xYr
POSS_3PL Possessive, 3 person, plural ('their') -lara/-lora/-lere/-lörö -nara/-noro/-nere/-nörö -dara/-doro/-dere/-dörö -tara/-toro/-tere/-törö -LArA
Table 2
Tags Description Allomorphs Morphemes
SG singular - -
PL plural -lar/-lor/-ler/-lör -nar/-nor/-ner/-nör -dar/-dor/-der/-dör -tar/-tor/-ter/-tör -LAr
Table 3
sponding tags we relied upon the work of the academician O.N. Boethlingk "About the language of the Yakuts" published in 1851. He registered ten cases in the Yakut language: Casus Indefinitus, Accusativus Indefinites, Dativ, Accusativus Definitus, Ablativ, Lokativ, Instrumental, Casus Ad-verbialis, Comitativ, Casus Comparativus (Boethlingk, 1990: 278-285). As it can be seen from the case names, there is no significant difference between the modern terms and those used by O.N. Boethlingk. In the modern Yakut language there are eight cases, Lokativ and Casus Adverbialis are not included into the case paradigm.
Simple declension
There are two types of declension in the Yakut language: simple and possessive (Korkina, 1982: 129-147). In simple declension, all morphemes have 4 allomorphs each, for example: -TA (-ta/-to/-te/-to).
Possessive declension
In total, simple (88) and possessive declensions (87) have 175 morphological indicators in the Yakut language. It all shows the huge functional capacity of nouns in the Yakut language as a special lexical and grammatical word class.
4. Personal endings of nouns
Nouns in the Yakut language can act as a predicate in sentences. In such cases, predica-tivity affixes are added to the word root, except the third person singular.
5. Diminutive
Diminutiveness category is an understudied aspect in the Sakha language. Table 7 shows common diminutive affixes -cYk, -cAAn, -kAAn with their allomorphs.
In addition to these affixes, the Yakut lexical units can consist of fossil affixes such as -yja, -cce, -ka, considered to be of little efficiency at the moment. In Table 8, they are rep-
Table 4
Tags Description Allomorphs Morphemes
NOM Nominative - -
PAR Partitive -ta/-to/-te/-to -la/-lo/-le/-lo -na/-no/-ne/-no -da/-do/-de/-do -TA
DAT Dative -ra/-ro/-re/-ro -xa/-xo/-xe/-xo -ga/-go/-ge/-go -ga/-go/-ge/-go -ka/-ko/-ke/-ko -xA
ACC Accusative after consonants: -y/-i/-u/-u after vowels: -ny/-ni/-nu/-nu -y -ny
ABL Ablative after consonants: -tan/-ton/-ten/-ton after vowels: -ttan/-tton/-tten/-tton -tAn -ttAn
INS Instrumental -nan/-non/-nen/-non -nAn
COM Comitative -lyyn/-liin/-luun/-luun -nyyn/-niin/-nuun/-nuun -tyyn/-tiin/-tuun/-tuun -dyyn/-diin/-duun/-duun -LYYn
COMP Comparative -taarap/-to orop/-teerer/-to oror -naarap/-noorop/-neerer/-nooror -daarap/-doorop/-deerer/-dooror -laarap/-loorop/-leerer/-looror -TAAxAr
resented downward from the diminutive point of view.
6. Derivation
Word-forming potential of nouns in the Sakha language requires a specific approach and a deep study. Without going into details, it should be noted that dozens of productive and non-productive affixes such as -hyt (-syt, -cyt, -djyt, -njyt), -byl (-bil, -bul, -bül), -lag (-leg, -log, -lög), -lta (-lte, -lto, -ltö) and others take
part in noun formation in the Sakha language. As an example of derivational affixes, let us consider three frequently used morphemes used to derive verbal nouns.
Examples of linguistic annotation of nouns
To validate the tag system developed for the linguistic annotation of the word forming potential of nouns in the Yakut language, let us analyze few examples.
Table 5
Tags Description Allomorphs Morphemes
NOM Nominative See Table 3 See Table 3
PAR Partitive -yna/-ine/-una/-une - ynA*
DAT Dative after consonants: -ar/-or/-er/-or after vowels: -gar/-ger -Ap -gAr
ACC Accusative -yn/-in/-un/-un -yn
ABL Ablative -ttan/-tten -ttAn
INS Instrumental -nan/-nen -nAn
COM Comitative -nyyn/-nuun/-neen non-literary version: -naan/-niin -nyyn
COMP Comparative -naarap/-neerer -nAAxAp
Tags Description Allomorphs Morphemes
P_1SG 1 person, singular ('I am') -byn/-bin/-bun/-bün -myn/-min/-mun/-mün -pyn/-pin/-pun/-pün -Byn
P_2SG 2 person, singular ('you are') -Yyn/-yin/-yun/-yün -xyn/-xin/-xun/-xün -kyn/-kin/-kun/-kün -gyn/-gin/-gun/-gün -gyn/-gin/-gun/-gün -yyn
P_3SG 3 person, singular ('he/she is') - -
P_1PL 1 person, plural ('we are') after -LAr: -byt/-bit/-but/-büt -byt
P_2PL 2 person, plural ('you are') after -LAr: -gyt/-git/-gut/-güt -gyt
P_3PL 3 person, plural ('they are') -lar/-lor/-ler/-lör -nar/-nor/-ner/-nör -dar/-dor/-der/-dör -tar/-tor/-ter/-tör -LAr
"currently out of use.
Table 6
Table 7
Tags Description Allomorphs Morphemes
DIM Diminutive -cyk/-cik/-cuk/-cük -cyk
-caan/-coon/-ceen/-cöön -cAAn
-kaan/-koon/-keen/-köön -kAAn
Table 8
Size L Size M Size S Size XS Size XXS
Lexeme -yja -cce -ka -caan
kuol 'lake' kölüje kölücce kölüke kölükeceen
urex 'small river' ürüje ürücce - ürüjeceen
xolbo 'box' xolbuja - xolbuka xolbujacaan
Table 9
Tags Description Allomorphs Morphemes
AN Agens noun -aaccy/-eecci/-ooccu/-ööccü -AAccy
VN Verbal noun -yy/-ii/-uu/-üü -yy
-aahy n/- eehin/-oohun/-ööhün -AAhyn
(1) xarandaac+(y)nan ^ xarandaahynan (uruhujduur)
pencil-INS
'(he draws) with a pencil'
(2) oro+q+un ^ ororun (koroor) child-POSS_2SG -ACC
'(look after) your child'
(3) ubaj+lar+byt+(y)gar ^ ubajdar-bytygar (bierbippit)
brother-PL -POSS_1PL -DAT '(we gave) our brothers'
(4) at+lar+ryt+(y)naarar ^ attargy-tynaarar (turgen)
horse-PL-POSS_2PL-COMP '(faster) than your horses'
(5) ije+te+neen ^ ijetineen (kelle) mother-POSS_3SG-COM
'(he came) with his mother'
Conclusion
During the research (2014-2018), all grammatical categories of nouns in the Sakha language have been analyzed. Through this process, the system, consisting of the conventional symbols (tags) used to reflect the inflectional potential of nouns in the Sakha language, including 247 affixes, has been fully completed.
To enable a computer to automatically analyze texts of any complexity presented in the electronic corpora of the Sakha language, it is necessary to provide standardized tags to all grammatical categories of the Sakha language. The solution of this problem would make it possible to develop new computer programs, such as online translators, automatic text analyzers, speech synthesizers and others.
References
Baker, M.C., Vinokurova, N. (2010). Two modalities of case assignment: Case in Sakha. In Natural Language & Linguistic theory, 28, 593-642. DOI: <10.1007/s11049-010-9105-1>.
Boethlingk, O.N. (1990). O iazyke iakutov [About the language of the Yakuts]. Novosibirsk: Nauka, 646 p.
Kang, D., Torotoev, G. (2016). Morphophonemic derivation of voice in the Sakha language. In Language, Communication, and Culture. The Journal of the Linguistic Society of the North East, 3, 66-90.
Korkina, E.I., Ubryatova, E.I., Kharitonov, L.N., Petrov, N.E. (1982). Grammatika sovremennogo ia-kutskogo literaturnogo iazyka. Fonetika i morfologiia [Grammar of the modern Yakut literary language. Phonetics and morphology], Moskva, Nauka, 496 p.
Kornfilt, J., Preminger O. (2015). Nominative as no case at all: an argument from raising-to-accusative in Sakha. In Proceedings of the 9th Workshop on Altaic Formal Linguistics (WAFL 9), MIT Working Papers in Linguistics 76, ed. Andrew Joseph & Esra Predolac, Cambridge, 109-120.
Levin, T., Preminger, O. (2015). Case in Sakha: are two modalities really necessary? In Natural Language & Linguistic Theory, 33, 231-250. DOI: <10.1007/s11049-014-9250-z>.
Torotoev, G.G. (2011). Funktsional 'no-stilisticheskaia differentsiatsiia opredeleniy v sovremennom ia-kutskom iazyke [Functional and stylistic differentiation of the attributive constructions in the modern Yakut language]. Yakutsk, North East Federal University Publishing and Polygraphic Complex, 148 p.
Torotoev, G.G. (2014). Metod modelirovaniia v issledovanii stikhoobrazuyushchego karkasa olonkho [Method of modeling in the study of Olonkho architectonics]. In Trudy Kazanskoy shkolypo komp 'iuternoy i kognitivnoy lingvistike TEL-2014 [Proceedings of the Kazan School of computational and cognitive linguistics TEL-2014]. Kazan', Fen PublishingHouse of the Academy of Sciences, 243-247.
Torotoev, G.G., Nogovitsyna, A.N. (2017). Lingvisticheskoe annotirovanie nakloneniy glagola iakutsk-ogo iazyka [Linguistic annotation of the verb moods of the Yakut language]. In Vestnik SVFU [North-Eastern Federal University Newsletter], 3, 108-120.
Zheltov, P.V. (2015). Morphological markup system for the national body of the Chuvash language. In Proceedings of the International Conference "Turkic Languages Processing: TurkLang-2015", Kazan, Academy of Sciences of the Republic of Tatarstan Press, 328-330.
Лингвистическое аннотирование грамматических категорий языка саха: имя существительное
Г.Г. Торотоев, С.Г. Торотоева
Северо-Восточный федеральный университет им. М.К. Аммосова Российская Федерация, Якутск
Аннотация. Статья посвящена работе по созданию инструментария для лингвистического аннотирования грамматических категорий языка саха. Базируясь на Лейпцигских правилах глоссирования, описываем основные словоизменительные характеристики имени существительного в якутском языке (число, персональность, посессивность, падежная система). В результате научно-изыскательских работ (2014-2018) создана система тэгов, отображающая весь словоизменительный потенциал имени существительного в якутском языке, включающий в своем арсенале 247 морфологических показателей. Разрабатываемая тюркологами унифицированная система морфологической разметки тюркских языков далеко не совершенна, существуют различные трактовки по части отображения и интерпретации грамматических категорий в разных тюркских языках. Несмотря на это, в статье обобщены конструктивные идеи коллег по данной проблематике.
Ключевые слова: лингвистическое аннотирование, грамматические категории, язык саха, имя существительное, число, посессивность, простое склонение, притяжательное склонение, диминутив, тэги.
Научная специальность: 10.00.00 - филологические науки.