George Starostin
Higher School of Economics / Russian State University for the Humanities / Santa Fe Institute; gstarst1@gmail.com
Chinese basic lexicon from a diachronic perspective: implications for lexicostatistics and glottochronology1
In this paper, I attempt to compare the relative rates of replacement of basic vocabulary items (from the 100-item Swadesh list) over four specific checkpoints in the history of the Chinese language: Early Old Chinese (as represented by documents such as The Book of Songs), Classic Old Chinese, Late Middle Chinese (represented by the language of The Record of Linji), and Modern Chinese. After a concise explication of the applied methodology and a detailed presentation of the data, it is shown that the average rates of replacement between each of these checkpoints do not significantly deviate from each other and are generally compatible with the classic «Swadesh constant» of 0.14 loss per millennium; furthermore, these results correlate with other similar observed situations, e.g. for the Greek language, though not with others (Icelandic). It is hoped that future similar studies on the lexical evolution of languages with attested written histories will allow to place these observations into a more significant context.
Keywords: Chinese language history, Old Chinese language, Middle Chinese language, lexi-costatistics, glottochronology, basic vocabulary.
Introduction
Over the last couple of decades, lexicostatistical methodology has played an important role in historical studies on the evolution of various «dialectal» forms of Chinese (or «Sinitic languages», from a more strictly linguistic point of view). Since there is no universally accepted model of the lexicostatistical procedure as far as the selection of source data, manual and/or automated annotation of lexical cognates, and the specific phylogenetic algorithm applied to the data are concerned, these studies significantly vary in terms of selected scope, stated goals, and attained results; but there seems to be a general understanding that conducting lexicosta-tistical studies is an important stage in unraveling the internal history of Chinese and identifying certain key points resulting in divergent linguistic lineages, as well as separating evidence for genetic splits from evidence for later linguistic contacts that tend to obscure the different lineages in question.
That said, most of the studies on Chinese (Sinitic) lexicostatistics have largely focused on quantifying and interpreting the degree of lexical divergence between modern colloquial forms of Chinese2, usually downplaying the important fact that Chinese is one of the very few
1 I thank Prof. Laurent Sagart for his valuable comments on parts of this paper, and Dr. Johann-Mattis List for the opportunity to present its major points before a large audience of specialists at the Old Chinese And Friends conference (Max Planck Institute for the Science of Human history, Jena, April 26-27, 2018). Any errors in data or its analysis are exclusively my own.
2 It is not within the scope of the current paper to provide a detailed listing of all the works that have applied quantitative methods to the problem of Chinese dialect classification. For those unfamiliar with the topic a good starting point could be the complex study of Mahe Ben Hamed and Wang Feng (2006), who apply a variety of disJournal of Language Relationship • Вопросы языкового родства • 17/2 (2019) • Pp. 153-176 • © The authors, 2019
languages in the world whose historical evolution can actually be studied by means of preserved written data, rather than reconstructed through the comparative method — and, consequently, one of the most important test cases in the world (along with several Indo-European and Semitic languages) when it comes to measuring rates of lexical evolution3.
The reasons for such negligence are understandable. Studying lexical replacement in languages represented only by a closed and limited corpus of written data necessarily runs into a number of uncertainties — insufficient attestation of required items in available texts, their occasional semantic ambiguity, and lack of direct knowledge on the dialectal characteristics of said texts, among other things. To make matters worse, historically attested forms of Chinese are commonly understood to mix together different strata — to the point that, for instance, our current understanding of Middle Chinese phonology (as extracted from rhyme books and rhyme tables) vastly exceeds our understanding of Middle Chinese grammar and lexicon, since most texts in the classic era of Tang and early Song dynasties were written in one or another variant of the archaic Literary Chinese. Circumstances such as these may seem to make the painstaking task of studying lexical replacement within Chinese in detail a waste of time, but in reality it is not that difficult to employ a somewhat formalistic approach to the matter and at least try to see what it gets us. However, in order for such a study to be of any use, it is imperative to state the rules very clearly and consistently apply them to all selected time periods and data collections.
The present paper is a tentative attempt to manually measure the rates of lexical evolution over a period of approximately 2,500-2,800 years in the history of Chinese. This is achieved by selecting several chronological checkpoints, constructing standardized Swadesh wordlists for each of them and individually investigating each certified or potential case of lexical replacement from one checkpoint to another. Two reasons why such a study, though still clearly far from perfect, could not have appeared earlier, are as follows: (a) a breakthrough in corpus studies on Old Chinese — largely due to the outstanding dedicated work of Donald Sturgeon and his colleagues, we now have the advantage of the online Chinese Text Project, allowing for complex lexical investigations on a large scale to be conducted almost momentarily; (b) significant methodological clarification of the lexicostatistical technique, described in several papers from the Moscow school of comparative linguistics (see the "Methodology" section below). Naturally, there is still much room for improvement (especially in the area of Middle Chinese, which remains considerably underdeveloped), but there is reason to believe that even at this stage, the results will be useful enough both for Sinologists and general specialists in diachronic linguistics.
Before presenting the data in its entirety, it is necessary to do the following things: (a) justify and describe the four selected chronological checkpoints — Early Old Chinese, Classical
tance- and character-based methods in order to determine whether the configuration of known forms of Chinese better agrees with a tree-like or a network-like structure; the same data was later made use of by Johann-Mattis List (2015) in his own investigation of the historical relations between Chinese dialects. Further references to earlier studies may be easily found in either of those papers.
3 To the best of my knowledge, only two brief attempts at measuring the lexical distance between Old Chinese and Modern Chinese have had their results mentioned in literature: (a) Swadesh 1952: 456 and subsequent papers by both Swadesh and Robert Lees make frequent reference to the results of C. Y. Fang, who allegedly established that 79% out of the 200-item wordlist of «Classic Chinese 950 A.D.» have been retained in «modern colloquial Northern Chinese»; the wordlist itself has never been published, making it impossible to verify the claim, and it is in fact quite unclear what is meant by «Classic Chinese 950 A.D.»; (b) Starostin 2000: 256 actually gives a specific list of replacements from «Archaic Chinese (seventh century BC)» to modern Mandarin that can be checked, and the verification shows a significant number of omissions (see below for specific examples).
Old Chinese, (Late) Middle Chinese, and Modern Chinese, including some discussion on dating, data sources, and various technical problems; (b) give a brief description of the methodology employed in selecting items for the respective positions in the wordlist, as well as the procedure of cognate scoring from one period to another. This will be followed by reasonably detailed discussion of the data itself, after which we present a brief analysis and state our conclusions on the tendencies of lexical evolution in the history of Chinese, including a typological-comparative angle.
Data sources
1. Early Old Chinese (EOC)
Definition: we approximately define Early Old Chinese as the language that is represented in writing by such literary monuments as the Shijing ('Book of Odes') and the oldest parts of the Shujing, or Shangshu ('Book of Documents'), as well as epigraphic data from artefacts (mainly bronze vessels) dating back to the Early Zhou dynasty (jinwen); the most comprehensive and systematic Western dictionary of this language is Schuessler 1987. In general, the language of all these texts is known to share certain grammatical and lexical properties that strongly distinguish it from later forms of Chinese, though it cannot be said for certain to represent a direct ancestral stage for any of them.
Reasons for selection: EOC is the very first chronological checkpoint for which it is possible to construct anything close to a standardized Swadesh wordlist. Although some observations may be made on certain elements of the basic lexicon in the oracle bone inscriptions of the Shang dynasty, the restricted and highly formulaic nature of these inscriptions leads to way too many gaps in the wordlist for it to be of any use for the present study. Therefore, a general statistically relevant investigation of Chinese basic lexicon may only begin from Early Zhou times.
Sources: Much, if not most, of the epigraphic material is ineligible for the task of building a Swadesh wordlist due (once again) to the highly formulaic subject matter and ritualistic nature of the texts, leaving the verses of the Shijing as the single most natural source for an EOC list of basic lexicon. Out of the 100 required items, only eight ('ashes', 'bark', 'bone', 'egg', 'knee', 'lie', 'liver', 'louse') have no reliable or probable equivalents attested in the Shijing (or in the eldest parts of the Shujing).
Problems: There is little doubt that the texts of the Shijing are relatively heterogeneous in terms of both time and space (see Dobson 1968: 224-242 for an attempt at a chronological linguistic stratification of the various sections of the Sh j ng based on grammatical evidence), but there is so far very little evidence that the dialects of the Sh j ng were significantly different from each other as far as their basic lexicon was concerned: very few synonyms for basic notions were elicited from the data, and those that were elicited are not easily described in terms of dialectal variety (see, e.g., 'give' below, with two different synonyms attested in the exact same poem). A much more significant problem is the scarceness of attestation for multiple terms: in many cases unambiguous contexts with the required word are found but once or twice, and their reliability often depends on external data (e.g. if the same word is also the basic equivalent for the term in Classical Chinese, this improves the chances of the corresponding item in EOC). All such terms are specially commented upon in the notes section, and particularly dubious inclusions are marked with a question sign.
2. Classical Old Chinese (COC)
Definition: We define COC as the language of literary texts, most likely reasonably close to the spoken language of the time, written from approximately the end of the 5th century to the end of the 3rd century BC. There is no single defining dictionary for this stage of the language, since lexicographical sources usually conflate it either with EOC or with Han-era OC (or both); however, the text corpus is reasonably well defined, and focused searches may be performed these days with the aid of such resources as the Chinese Text Project (Sturgeon 2019).
Reasons for selection: COC is the first known historical stage of Chinese that is represented by a substantial amount of thematically diverse non-poetic texts which, according to a general scholarly consensus, are written in a language that reasonably closely reflects colloquial patterns of the time (with certain expected stylistic emendations, though their influence on core basic lexicon is probably negligible). A significant advantage of this period is that the language of the texts in question is not as highly influenced by the language of the previous period (EOC) as the written language of Han-era and later periods is dependent on COC.
Sources: COC is generally understood to have possessed a significant amount of dialectal diversity; even if evidence for this rarely comes from core basic vocabulary, for the sake of increased accuracy we prefer to draw upon sources typically recognized to stem from the same dialectal area. The principal texts corroborating our selections are the Lun yu and (especially) Meng-zi, both recognized as representative of the Lu dialect (Pulleyblank 1995: 3), although there may be a chronological gap of about 100-150 years in their original composition (not essential for our purposes).
If the necessary words are encountered very rarely or not encountered at all in these texts, we find it acceptable to draw upon data from other sources, such as the Zuo zhuan (representing a separate dialect of its own, together with the Guo yu) and Zhuang-zi (probably representing a more Southern, Chu-area, dialect, though this is debatable). For the record, the following words are not attested at all in the Lun yu and Meng-zi and have to be substituted from other sources:
t 1 5 ei • , J t • 1 5 « 1 J « J «1 5 «1 • J «1 5? 15« .5 t 15« 15« 1'« • 5
ashes, Dite, nail, dry, green, knee, liver, louse, red, root, round, sand, smoke, swim, 'tail', 'tongue'. Since every single one of these 16 items is either the same as in EOC or the same as in MC or both, and since we have been unable to reliably elicit even a single undeniable difference in the Swadesh wordlist between any of the listed texts, such substitution should be permissible.
Problems: COC is (arguably) one of the least problematic periods in the history of Chinese when it comes to eliciting basic lexicon; see above on the relative insignificance of dialectal divisions for this purpose. Several dubious cases of elicitation, usually having to do with scarceness of attestation and ambiguity of translation, are commented upon specifically in the data section of the paper.
3. Middle Chinese (MC)
Definition: For the purposes of the current study, Middle Chinese is narrowly defined as the colloquial (or reasonably close to colloquial) stage of Chinese, chronologically coinciding with or closely following the beginning of the division of Chinese into the principal dialectal groups of today, i.e. what is commonly called Late rather than Early Middle Chinese. This is due to the fact that texts from the Early Middle Chinese era (first half of the Tang dynasty, 7th-8th centuries AD) are nearly always written in an archaic form of the language (wen yan or gu wen), whereas for the Late Middle Chinese period (late Tang and early Song dynasties) there is a limited, but useful corpus of textual evidence that is somewhat sufficient for purposes of lexi-costatistical analysis.
Reasons for selection: The entire period between COC and the 20th century is an extremely difficult area for lexicostatistical evaluation, since almost every text written in traditional imperial China, from Han all the way to Qing dynasties, is influenced, to various degrees, by the grammar and lexicon of COC, and hardly ever reflects the spoken language of the corresponding period. It is precisely for this reason that we have refrained, for instance, from attempting to construct a separate 100-item wordlist for the language of the early or late Han dynasty, despite the abundance of textual evidence from that period — perusal of such vast sources as Sima Qian's Shi ji, for instance, shows that in many cases Swadesh items are represented by at least two competing equivalents (e.g. ^ quan and ^ gou for 'dog', S ying and ^ man for 'full', etc.), and it is often impossible to determine whether such situations are due to true «transit synonymy» (when a lexical innovation has not yet fully managed to displace the original neutral term) or to the intentional (or unintentional) mixing of standard colloquial and outdated archaic equivalents.
Nevertheless, it is very important to have at least one analyzable «checkpoint» on the almost 2,500 year long way from COC to Modern Chinese, and from a general chronological and qualitative point of view, Late Middle Chinese is the optimal, if far from perfect, candidate for this purpose, since this is the period of proliferation for the genre of the yulu («records of sayings»), a new genre of Buddhist literature whose innovative and frequently iconoclastic nature placed a large emphasis on transmitting sermons, parables, and real life anecdotes by means of colloquial idioms. In general, the yulu may be considered as the first fully colloquial genre of literature in the history of past-COC Chinese, and although it is more thematically limited than the fictional genres of late Song, Yuan, and Ming dynasties, its advantages are that it is represented by chronologically older texts and that at least some of these texts are arguably more free from literary embellishments than the literary genres of huaben and xiaoshuo (classic short stories and novels from Song to Ming-Qing times).
Sources: A thorough lexical analysis of all or most of the existing texts in the yulu genre has not been conducted yet; an important problem is that some of the texts may reflect serious dialectal differentiations. For this reason, analysis has so far been restricted to just one reasonably large and generally uniform specimen of the genre, namely, the Linji yulu («The record of Linji»), a text traditionally attributed to the disciples of the school of Master Linji Yixuan (d. 866 AD) but not finalized until the late 11th-early 12th centuries. The language of the Linji yulu and the yulu genre has been the subject of several meticulous studies, e.g. Sawer 1969, Gurevich 2001, but all of them focus almost exclusively on grammar rather than lexicon; nevertheless, analysis of the basic words used in the text is in perfect agreement with the grammatical data in that the Linji yulu does indeed attempt to represent the colloquial standards of its time, albeit with some inescapable influence of the more classical forms as well.
Problems: Restriction to a single source necessarily implies that our MC list has the heaviest gaps of all (at least 18 out of 100 items are not featured at all in the text, and 8 more are somewhat problematic due to scarceness of attestation and semantic ambiguity); the problem is somewhat alleviated by the fact that the majority of these gaps are items that are represented by the same equivalent in COC and Modern Chinese, so it may be reasonably assumed that they were not replaced by anything else in MC as well.
4. Modern Chinese (PTH)
Definition: Since this study is only concerned with the issue of relatively straightforward dia-chronic evolution from a single point in the past to a single point in the present, we intentionally limit our definition of «Modern Chinese» to the present day version of putonghua, the
common national language generally based on the Beijing Mandarin dialect; linguistic differences between the actual spoken varieties of Beijing Mandarin and putonghua are well known, but do not generally extend to core basic vocabulary, making this factor negligible.
Reasons for selection: Theoretically, any other Chinese «dialect» (with the exception of Min, since that cluster is typically assumed to have split off from the rest before the beginning of the MC period) might have been substituted here, but the task of constructing a 100-item wordlist for putonghua is naturally easier than for any of the rest. A separate study is necessary to assess the rate of evolution from MC to PTH relative to other varieties of spoken Chinese that are in use today.
Sources: A variety of sources has been used (textbooks, dictionaries, text corpora, live informants etc.).
Problems: This is the least problematic area of all; issues are typically limited to purely technical problems, such as choosing a monosyllabic or bisyllabic variant for the most common equivalent of a given meaning (where the adopted solution usually bears no impact on calculations anyway).
Methodology of wordlist construction and lexical comparison
In constructing the optimal wordlists for each of the four stages, I attempt to follow as closely as possible the guidelines laid down in Starostin 2010 and Kassian et al. 2010, which can largely be boiled down to the following principles: (a) elicit words whose meaning and stylistic register are as close as possible to the pre-defined meanings listed in the latter paper; (b) try to avoid the inclusion of multiple synonyms, whose presence undermines the main idea of lexicostatistics.
Obviously, when dealing with written stages of the language represented by closed (and usually not very large) corpora, formal and precise adherence to these principles is not always possible. Due to the nature of the data itself, all of the wordlists presented below, with the exception of the wordlist for Modern Chinese, will inevitably contain errors, some of which might not even be rectified in the future unless massive new amounts of data (e.g. from archaeological sources) become available. However, the important thing here is to make certain that these errors do not skew the quantitative conclusions in any one particular direction, i.e. that they do not increase specifically the number of lexical replacements or the number of lexical retentions from any chosen point in the history of Chinese to the next one. This implies the necessity of a transparent, objective, well-argued methodology of dealing with ambiguous situations, one that should preferably minimize the possible interference of the personal preferences of the compiler. Below I list some of the general points; specific applications may be found in the comments on particularly troublesome lexical items in their respective sections.
1) Be wary of etymological arguments. Frequently, when facing the choice between picking one out of two or more synonyms, or including all of them into the list, one may be led astray by the fact that an older equivalent, clearly going back to the original main equivalent for a given Swadesh term, is still preserved at a later stage in the development of the language — ignoring the clear fact that its semantics has shifted, as the word is now used in a slightly different meaning or has been relocated to a different (marked) stylistic register (vulgar or elevated).
This is, for instance, the reason of several important mistakes in Starostin 2000: 256, a general study in the methodology of lexicostatistics where Old Chinese is compared with Modern Chinese and 23 lexical replacements are identified. The study fails to list several transparent
replacements, such as @ mu ^ yan-jing 'eye' and ^ shou ^ M tou 'head', presumably because the former equivalents are still encountered today in various bound idiomatic formations and archaic contexts. This leads to underestimations of the process of lexical replacement, and the problem gets even worse for eras that are only represented by written documents, since written language by its very nature fails to keep up the pace with developments in the colloquial idiom, and special care must be given to the study of preserved texts in order to make a qualified decision on whether a certain lexical replacement has already been completed at a given period or not. In any case, 'etymological argument' alone, not supported by actual data from texts, does not carry significant value.
2) Watch out for bound forms and idiomatization. The «basic» equivalent of any given meaning is typically understood as the most neutral and generally context-independent form: the more words there are that an observed candidate can enter in syntactic relations with, the better are its chances for historical stability. Thus, COC has multiple equivalents for the meanings 'die' and 'kill', but a great majority of them has limited syntactic applicability: e.g. ^ shi 'to kill' is only used in reference to killing a superior (prince, father, etc.), whereas M hong 'to die' is only said of high officials. Not surprisingly, these are precisely the words that do not survive into the MC era, whereas the neutral sha 'kill' and ^ si 'die' persist all the way into most modern forms of Chinese.
3) Textual evidence is generally superior to dictionary information. With a closed and relatively limited textual corpus that is not particularly well reflected in specialized dictionaries, OC is clearly one of those ancient languages where direct elicitation of lexical data from the corpus is much preferable to relying on dictionaries. In a few cases, observations of actual word usage in the attested texts may lead to startlingly unpredictable conclusions (see notes on possible replacements from EOC to COC below); more importantly, finding relevant syntactic and semantic contexts adds a much wanted level of confidence to our wordlists, and also helps differentiate between statistically frequent and rarely used synonyms. This is particularly helpful for transitional stages of the language, in which an older equivalent may already be retained only as a rare archaism (including quotations from and paraphrasing of older texts), while the newer replacement may be more frequent — however, such situations will rarely, if ever, be discussed or even hinted at in dictionaries.
Regarding the procedure of cognate scoring, in this particular setting it is essentially reduced to the procedure of postulating lexical replacements from one time period to another. In addition to the obvious (lexical replacements are assumed whenever word X, used in a given Swadesh meaning over the time period tn, is no longer used in that meaning over the time period tn+i), we try to observe the following rules:
1) Statistics and stylistics matter. This is essentially a recapitulation of points 1 and 2 from the previous section: even if the same word is encountered seemingly in the same meaning over several distinct time periods, this does not always imply that it has not actually been replaced. Written Chinese has always operated according to the «forget nothing» principle: no matter how archaic a certain word is, there is always some probability of encountering it in texts that are separated by any number of years from its time of proliferation. What matters is primarily the statistics of usage (if there are two or more synonyms, which one is the most frequent?) and the stylistic context of usage (if there are two or more synonyms, which ones are used in quotations, poetic formulas, imitations of archaic rhetorics — and which ones are used in colloquial direct speech or neutral descriptions of situations?). If it can be shown that synonyms A and B express the same meaning in tn+i as exclusively A in tn, but that A is rare compared with B and primarily used in stylistically marked contexts, we postulate a clear-cut lexical replacement.
2) Morphological change does not matter. The issue of «partial cognacy», where two equivalents of the same Swadesh meaning in two different languages (or different stages of the same language) consist of two or more morphemes, of which only one (usually the root) is etymol-ogically shared between them, while the others are different, seems to be particularly acute for languages that frequently resort to compounding techniques, including Chinese. This issue has been discussed several times in literature (e.g. List 2016; Starostin 2013a), but still remains without a perfect solution. Should a difference such as COC £ zhi 'to know' vs. Modern Chinese £M zhi-dao id. be reflected by assigning both items the same index of cognacy (no lexical replacement), different indexes (replacement), or marked in some other manner (e.g.awarded «half a point» instead of a regular full +1 index, etc.)?
In my opinion, a definitive solution to this issue is impossible until a solid experimental base for this type of situations has been built up — which would allow us to cross-linguistically compare replacement rates for different methods of scoring and choose the solution that would make more general sense from a historical point of view. In the meantime, for Chinese I prefer to stick to the «no lexical replacement for partial cognacy situations» scenario, for the following reason: in most cases, morphemic compounding in the history of Chinese is explainable by reasons that have nothing to do with semantic shifts and more to do with the phonetic evolution of the language (avoidance of ever-increasing levels of homonymy), which is clearly not what we really want to measure when choosing lexical change as a base parameter for glottochronology. Therefore, in this study classical £ zhi will be scored exactly the same as modern £M zhi-dao.
However, one important thing about both classical and modern Chinese compounds («binomes») that should be stated is that in many (not all) cases a binome may easily be analyzed as containing a primary and a secondary morpheme. The primary morpheme is the historical root morpheme; its defining diachronic characteristic is that it tends to be more stable over both time and space, and its defining synchronic characteristic is that, unlike the secondary morpheme, it can still be frequently encountered, usually in bound form, without the secondary morpheme in its original meaning. The secondary morpheme largely acts as an additional determiner: as a rule, it is less stable across time periods and dialects, it may be omitted in certain contexts, and whenever encountered on its own, it is rarely or never used in the same meaning as the primary morpheme.
A good example is Modern Chinese ^^ yue-liang 'moon', where ^ yue is the primary morpheme because it may be encountered on its own in the same meaning (usually in other bound forms, e.g. ^^ yue-ye 'moonlit night, etc.), whereas ^ liang 'light, shine' is never encountered with the meaning 'moon' if not in conjunction with ^ yue; not surprisingly, ^ yue is also the historically stable morpheme 'moon', common for most varieties of Chinese, whereas ^ liang is a more recent addition and alternates with other additions in different dialects (e.g. ^^ yue-guang, ^^ yue-zi etc.).
Somewhat more complicated are cases of concatenated binomes in which, upon first sight, both morphemes express the same meaning and are hard to classify as respectively primary and secondary — such as M^ dao-lu 'road' (literally 'road1' + 'road2') or ya-chi 'teeth' ('toothy + 'tooth2'). It would seem that technically, the best solution here would be to judge the two morphemes as synonymous and include both into the calculations. However, even in this situation analysis of the behavior of the respective meaning in different contexts actually shows that one morpheme typically prevails over the other. Thus, in the meaning 'road' Modern Chinese frequently employs simple ^ lu da lu 'big road', etc.), but practically never M dao (which is far more common in the abstract meaning 'way, manner'); the meaning 'tooth / teeth' is frequently expressed by ^ ya (as in shua ya 'brush one's teeth', etc.), but almost
never by ® chi. I interpret this as clear evidence that in forms such as dao-lu and ya-chi, one morpheme still behaves as primary and the other one as secondary, even if from a historical point of view (as can be seen from comparison with OC evidence, see the data below) it is the secondary morpheme that reflects the original Swadesh equivalent — see, however, the «be wary of etymological arguments» point above, which clearly pressures us into regarding such situations as lexical replacements.
One might argue that such a solution directly contradicts the «morphological change does not matter», but this is only if we understand the dynamic genesis of such compounds as ya-chi as the extension of the primary morpheme ® chi with the «prefixed» quasi-synonymous morpheme ^ ya, when in reality the process must have been far more complex: equivalents of the monosyllabic ^ ya are found in the basic meaning 'tooth' in many Chinese dialects, as well as alternate binomes such as ^E ya-ba, etc., indicating that the structure of ya-chi is, in fact, quite analogous to that of ^^ yue-liang. Ignoring this would mean ignoring an important element of lexical restructuring in the history of Chinese, and while other formal solutions are possible, in this study we will try to consistently apply this principle to the procedure of cognate scoring.
Notes on transcription
Since this study is only concerned with different stages of Chinese and not with the Sino-Tibetan (or areal) origins of the Chinese entries, issues of phonetic and phonological reconstruction of OC and MC are largely irrelevant; cognate identification is not required between OC, MC, and PTH, and phonological or phonetic transcriptions of Chinese characters only matter inasmuch as the paper might also interest general historical linguists with no knowledge of Chinese hieroglyphics, or, occasionally, to specify which particular pronunciation out of several possible ones is meant for a specific character (e.g. ^ *draj > chang 'long', not ^ *traj? > zhang 'grown-up', etc.).
Throughout the study, I consistently use the OC reconstruction of Sergei Starostin (1989), some of the aspects of which remain controversial (e.g. the reconstruction of lateral affricates and voiced aspirates, or the interpretation of Type A / Type B syllable distinction as reflecting an opposition in vowel length) but which I also find reasonably conservative in comparison with the far-reaching changes in Baxter, Sagart 2014. OC Reconstructions are taken either directly from Starostin 1989 or from Sergei Starostin's unfinished etymological database on Old Chinese («Chinese Characters Database» at the Tower of Babel website, http://starling.rinet.ru). MC readings are used very sparsely throughout the rest of the paper; where necessary, they are also taken from Starostin's database. Modern Chinese forms are transcribed in standard pinyin. OC and modern readings are typically given back-to-back next to the respective characters, with OC reconstructions accompanied by asterisks.
The data
All four wordlists have been published online as part of the Sinitic 100-item wordlist database, included in the Global Lexicostatistical Database framework (http://starling.rinet.ru/new100); in addition to the words themselves, the database includes plenty of annotations and comments, such as precise references to sources, quotations of contexts from which the items have been elicited, and (sometimes highly detailed) explanations on why certain synonyms were pre-
ferred over others. This section of the paper represents a seriously condensed, but also partially reworked variant of that part of the database, with all the words rearranged in order of their relative historical stability.
First I discuss the subset of «super-stable items» that have been retained from EOC all the way to PTH (this is the largest sub-set, but also understandably requiring the least amount of commentary); then group B consists of «medium-stable items», for which it makes sense to postulate one replacement over the analyzed 2,500-year long period; finally, the shortest and the most difficult group C consists of «highly unstable» items that may have undergone no fewer than two replacements over the same period. Group D lists two interesting deviations where intermediate periods may show «dead-end» dialectal semantic developments, and, finally, Group E lists one item that has been excluded from analysis due to insufficient data.
A. Super-stable items (61 words).
A.1. Items attested with the same root morpheme throughout all four stages of Chinese.
A.1.1. 'big': ^ (*dha:ts > da).
A.1.2. 'black': M (*s=ma:k > hei). ◊ Transparently derived from M *ma:k > mo 'ink', but still clearly the primary neutral equivalent for 'black' already in EOC. The idea that M hei had replaced the earlier ^ *gwi:n > xuan in this meaning during the Zhou period (Schuessler 2007: 277) seems to rely more on the derived origin of hei than concrete textual evidence: there are, in fact, no contexts at all in EOC or COC literary monuments where xuan should be unambiguously translated as 'black' rather than a more general 'dark'4. For a good context supporting a basic function for M hei (as well as ^ chi 'red', see below), cf. M^ffiMMMffiM mo chi fei hu, mo hei fei wu «there is nothing redder than a fox, nothing blacker than a raven» (Shijing 41, 3); no such diagnostic contexts are available for ^ xuan or any of the even more rare quasisynonyms for 'black, dark', such as II zi (only found twice in the Shijing applied to some names for garments).
A.1.3. 'blood': M (*swi:t > xue).
A.1.4. 'cloud': ® (*whdn > yun).
A.1.5. 'come': ^ (*m > lai).
A.1.6. 'die': % (*siy? > si).
A.1.7. 'dry': (*ghar > gan).
A.1.8. 'ear': ^ (*nhd? > er). ◊ In the modern language, used primarily as part of the binome ^^ er-duo, lit. 'ear-cluster'.
A.1.9. 'fire': X (*smd:y? > huo).
A.1.10. 'fish': ft (*yha > yu).
A.1.11. 'hair /of head/': ^ (*pat > fa). ◊ All four stages of Chinese show a very clear and persistent lexical differentiation between *pat 'hair of the head' (in the modern language, typically used as part of the binome M.M. tou-fa 'head-hair') and ^ *mha:w 'hair on the body' (also 'wool', 'fur', etc.).
A.1.12. 'hand': ^ (*tlhu? > shou).
4 A different opinion is voiced in Wu 2011: 87, where it is stated that in the corpus of bronze inscriptions, ^ xuan is more frequent than M hei and is a better candidate for «basic 'black'» than the latter. However, Wu does not list any diagnostic contexts; frequency alone is not a clinching argument here, if, for instance, ^ xuan (like In zi in later received texts) was a typical term for denoting specific shades of ceremonial clothing, frequently depicted in bronze inscriptions. Note that most of our other observations on the evolution of color terms largely coincide with the thorough analysis presented in Wu 2011.
A.1.13. 'heart': ^ (*sam > xin). 0 In the modern language, used primarily as part of the binome xin-zang, literally 'heart-store'. Already in the ancient texts, the word is much more frequently found in abstract meanings ('mind', 'soul', 'conscience', 'intention', etc.) than in the required anatomical meaning; however, there is no evidence whatsoever that Chinese ever knew a different term for the anatomical 'heart'.
A.1.14. 'horn': ^ (*kro:k > jiao).
A.1.15. 'I': ^ (*jha:y? > wo). 0 For the EOC period, ^ ~ ^ (*dla > yu) must be added as a synonym; the semantic difference between wo and yu is a much debated and still unresolved issue. However, both variants are known already from the Shang period, so there are no arguments in favor of a lexical replacement (merely the elimination of one of the synonyms in the COC period). In COC as well as in certain series of Zhou epigraphic inscriptions, ^ *jha:y? co-exists with the morphological variant ^ *jha, but this has no bearing on lexicostatistical calculations, since the root morpheme is obviously the same.
A.1.16. 'kill': (*sra:t > sha). 0 There are some signs that in the modern language, the old word sha (or its bisyllabic counterpart sha-si) is being gradually replaced by the colloquial JT^ da-si (lit. 'hit-die'), but sha is still a frequent and stylistically neutral equivalent.
A.1.17. 'know': £ (*tre > zhi). 0 Typically used as part of the binome zhi-dao in the modern language. It is useful to note that in the Linji lu dialect this word is in free competition with the synonymous ^ (*tdk > shi), whose meaning in COC is closer to 'learn, keep in memory' and in the modern language to 'be acquainted with smbd.'; cf. contexts such as H^ff^^ «[I] always know the place from which he comes», etc. However, this observation has no impact on the overall statistics for lexical replacements.
A.1.18. 'leaf': ^ (*lhap > ye). 0 Extended with the desemanticized suffix ^ in the modern language ye-zi).
A.1.19. 'many': # (*ta:y > duo).
A.1.20. 'meat': ^ (*nhuk > rou).
A.1.21. 'moon': ^ (*jot > yue). 0 Typically used as part of the binome ^^ yue-liang (lit. 'moon-shine') in the modern language.
A.1.22. 'mountain': lL (*sra:n > shan).
A.1.23. 'name': ^ (*mhej > ming). 0 Typically used as part of the binome ^^ ming-zi (lit. 'name-cognomen') in the modern language.
A.1.24. 'new': ^ (*sin > xin).
A.1.25. 'night': U (*lias > ye).
A.1.26. 'nose': (*bhits > bi). 0 Extended with the desemanticized suffix ^ in the modern language bi-zi).
A.1.27. 'not': ^ (*pd > bu).
A.1.28. 'one': — (*?it > yi).
A.1.29. 'person': A (*nin > ren).
A.1.30. 'rain': M (*wha? > yu).
A.1.31. 'see': M (*ke:ns > jian).
A.1.32. 'sit': ^ (*jo:y? > zuo). 0 The word is only scarcely attested in EOC, and there may be some doubt as to whether it was really the most common and neutral equivalent for 'sit' during that period; a possible competitor is ^ (*ka > ju, with a possible falling tone variant *ka-s) 'to stay, dwell, reside', for which some contexts might suggest an earlier semantics of 'sit'. There are, however, no strong arguments for taking ^ zud out of the lexicostatistical comparison; at best, ^ zuo and ^ ju could be thought of as synonyms (for the EOC stage only).
A.1.33. 'small': / (*sew? > xiao). 0 Several more specific adjectives denoting minuscule size are found in the texts (e.g. M *se:s > xi, ^ *mdy > wei), but they are statistically infrequent and
never feature in the standard antonymous pair A da 'big' vs. / xiao 'small', for which there are multiple examples in the Shijing.
A.1.34. 'stone': H (*diak > shi). 0 Usually extended with the desemanticized suffix K in the modern language (HK shi-tou).
A.1.35. 'swim': (*lu > you). 0 In the Linji lu, only attested in application to fish 'how did the fish that swim lose their way?'), but no evidence for any different verb denoting the corresponding human activity. In the modern language, mainly used as part of the binome ^^ you-yong, where ^ yong (attested already in the Shijing) seems to be the original equivalent for 'to wade (in water)'.
A.1.36. 'tail': M (*mdy? > wei). 0 Extended with the desemanticized component E ba (ety-mologically = fE ba 'handle') in the modern language (ME wei-ba).
A.1.37. 'thou': (*nha? > ru) ~ M (*nhey? > er). 0 Both of these variants (freely interchangeable in some texts, dialectally or syntactically conditioned in others), as well as the modern variant ^ ni, clearly go back to the same root; alternations in the coda sometimes reflect archaic morphology and sometimes irregular dialectal developments, understandable for such high frequency usage forms as personal pronouns. No lexical replacements identified.
A.1.38. 'tongue': § (*lat > she). 0 Typically used as part of the binome she-tou (with the same desemanticized suffix as in 'stone' q.v.) not only in the modern language, but already in MC: both the short variant she and the disyllabic form are encountered in the Linji lu as free variants.
A.1.39. 'warm (hot)': ^ (*^et > re). 0 For this entry, we choose 'hot' (= 'exceeding tolerable temperature') rather than 'warm', as allowable in the GLD. Unlike 'warm' (OC / *?un > wen; modern @ nuan), 'hot' is quite stable throughout all four stages of Chinese.
A.1.40. 'water': 77 (*tuy? > shui).
A.1.41. 'we': ^ (*qha:y? > wo). 0 In EOC and COC, sg. 'I' and pl. 'we' were usually not distinguished from each other. From Han times on, the differentiation, when necessary, is performed by desemanticized quasi-suffixal morphemes wo-gong, ^^ wo-deng, ^^ women etc.) without any replacements for the root morpheme.
A.1.42. 'white': S (*bra:k > bai).
A.1.43. 'who': ^ (*duy > shui). 0 The morphological derivate ^ *du-k (> shu), originally 'which one /out of several/?', sometimes replaces the original ^ shui in some dialects of late OC, but this has no bearing on the overall statistics.
A.1.44. 'woman': ^ (*nra? > nu). 0 Used by itself or within the binome ^A nu-ren (lit. 'woman-person') in the modern language.
A.1.45. 'yellow': m (*ghwa:y > huang).
A.2. Items not attested in the Linji lu dialect of MC, but well attested at the three other stages.
A.2.1. 'bird': M (*ti:w? > niao). 0 Initial n- in the Beijing dialect is irregular, but the word is still clearly cognate with its OC predecessors. Should be distinguished from OC ^ *ghdm 'game-bird', used mainly in hunting contexts.
A.2.2. 'fat': H (*kiy > zhi). 0 In the modern language, mainly used as part of the binome zhi-fang (already attested in texts going back to the Jin dynasty, 3rd-5th centuries A.D.). For both stages of OC, an additional synonym is the word ^ *kaw (> gao); semantic difference between *kiy and *kaw is impossible to reliably determine based on the available text corpus (in the Shuowen jiezi *kiy is explained as 'fat of horned cattle' and *kaw as 'fat of hornless cattle' — an explanation not explicitly confirmed by textual usage, but showing that the two words must have been very close). However, || *kiy is well attested already in the Shijing, and the existence of an additional synonym is not a reason for postulating a lexical replacement.
A.2.3. 'feather': ^ (*wlrla? > yu). 0 In the modern language, normally used as part of the binome yu-mao, lit. 'feather-hair'.
A.2.4. 'fly /v./': M (*psy > jet).
A.2.5. 'long': ^ (*draj > chang).
A.2.6. 'round': H (written simply as m in the earlier texts; *wran > yuan). 0 Attestation in the adjectival meaning in EOC and early COC is extremely scarce and dubious, but verbal ('to be around') and nominal ('circumference') meanings are attested (Schuessler 1987: 791), and there are no other serious candidates for the expression of the adjectival meaning in those periods.
A.2.7. 'sand': ^ (*sra:y > sha).
A.2.8. 'seed': f§ (*toj? > zhong). 0 Extended with the desemanticized suffix ^ in the modern language zhong-zi).
A.2.9. 'skin': 0 (*pra > fu). 0 In the modern language, used only as part of the binome ^0 pi-fu, where ^ (*bhay > pi) is also a very old word, encountered much more frequently than *pra already in EOC (Schuessler 1987: 169, 457); however, its EOC attestations are completely restricted to the notion of 'animal skin', 'fur', 'hide', transparently separating it from the required Swadesh meaning of 'human skin'. The first references to *bhay as 'human skin' seem to appear no earlier than in Han-era texts, and even then mostly as part of the already attested binome pi-fu (co-existing with simple fu).
A.2.10. 'star': M (*she:j > xing).
A.3. Items not attested (properly) in EOC, but stable throughout all other periods.
A.3.1. 'ashes': COC ^ (*smd:y > hui). 0 Not attested at all in EOC (nor in the Linji lu, for that matter), but this is the only word with the basic meaning 'ashes' throughout the entire known history of Chinese. Even the graphic shape of the character ('hand' + 'fire') suggests an archaic origin, despite not being attested in epigraphic monuments.
A.3.2. '/tree/-bark': COC ^ (*bhay > pi). 0 It seems that the basic root for 'tree-bark' has always been the same as the root for '/animal/ skin, hide' in general (see A.2.9), although specific instances of 'bark' are lacking in both EOC and the Linji lu. In the modern language, the default equivalent is rather the binomial ^^ shu-pi, where ^ shu = 'tree'; this does not count as a replacement.
A.3.3. 'bone': COC # (*ku:t > gu). 0 Strangely enough, the word 'bone' is not at all attested in EOC; however, the graphic shape of the character looks archaic, and there is no specific reason to suggest that the EOC equivalent may have been different. In the modern language the word is usually extended with the desemanticized suffix M (#M gu-tou).
A.3.4. 'knee': COC ^ (*sit > xi). 0 A somewhat problematic entry; the word 'knee' is not really attested in Chinese until texts typically dated to around the 3rd — 1st cent. BC (Xun-zi, etc.), nor is it encountered in the Linji lu. Again, however, nothing indicates the existence of any other word in this meaning throughout all the stages of non-dialectal Chinese. In the modern language, the default equivalent is the binome ^^ xi-gai, lit. 'knee-cover', that does not count as a replacement.
A.3.5. 'liver': COC If (*ka:n > gan). 0 Well attested in COC (though not in early Confucian texts) and MC, but not found in EOC. No indication of any possible alternate equivalents throughout any of the stages of written Chinese.
A.3.6. 'louse': COC ^ (*srit > shi). 0 Attested in COC (though not in early Confucian texts), but not known in EOC or in the Linji lu. Extended with the desemanticized suffix ^ in the modern language shi-zi). The word has a solid Sino-Tibetan etymology (= Tibetan sig, Lushai hrik 'louse' etc.), indirectly confirming that the word has been super-stable from the beginning.
B. Medium-stable items (31 words)
B.1. Replacements from EOC to COC.
B.1.1. 'breast (= chest)': EOC 0 (*?rdq > ying) ^ COC ^ (*sqoq > xiong). 0 The latter word is quite clearly the main equivalent for 'male chest' in both COC and the modern language, and is encountered once in the Linji lu in the bound expression zhi-xiong 'to point at one's breast', which makes it at least a plausible candidate for the same meaning in MC. Conversely, the word is not encountered in any EOC texts, where the only known possible equivalent is 0 *?ray (although it is largely used in bound expressions and figurative meanings as well). This is sufficient evidence to at least suspect a lexical replacement.
B.1.2. 'man': EOC ^ (*pa > fu) ^ COC ^ (*m:m > nan). 0 A debatable choice. The assumed replacement *nd:m is actually well attested already in EOC (Schuessler 1987: 436). However, throughout that period it is encountered infrequently, most often to denote a specific feudal title ('nan' = 'baron'); more basic usage is generally confined to the noun phrase ^^ *m:m-cd? '(male) son', used to specify the gender of the descendant (and thus opposed to ^^ *nra?-cd? '(female) daughter'. Schuessler adds several epigraphic examples in which m:m means 'male descendant, son' all by itself and may thus be an abbreviation of *m:m-cd? (e. g. ^^^ *qha:y? gho:? nd:m 'my (future) male descendants' [1381 Xuan], etc.). On the other hand, EOC *pa is statistically far more frequent, and in most contexts, applied to human beings that are male by default (soldiers, farmers, etc.) or expressly meaning 'husband'. It is interesting that in the sole known early literary context in which we encounter the noun phrase ^A *pa-nin [Shangshu 42, 9], it clearly refers to 'man' or 'men', whereas already in COC the term *pa-nin is more commonly used to denote the wife, i. e. 'man's person', rather than 'man-person'. As for the use of *pa itself in the COC period, most texts clearly show that it is employed in a «socially marked» manner, either in the derived meaning 'teacher, master' (usually within the compound ^^ *pa-cd?), or in the meaning 'husband' (often within the antonymous pair ^^ *pa-bd? 'husband(s) and wife (wives)'). All of this speaks in favor of a gradual transition from *pa to *nd:m, with *pa still functioning as the main word for 'male person' in Early Zhou.
B.1.3. 'road': EOC M (*lhu:? > dao) ^ COC J (*ra:ks > lu). 0 In EOC, *lhu:? is the most statistically frequent word denoting the idea of 'road' without any further connotations. It also serves as the basis for the derived verb ^ *lhu:-s 'to lead, conduct (along the way)' (Schuessler 1987: 116). The word J *ra:ks 'road' (Schuessler 1987: 395), in comparison, is encountered only in a tiny handful of contexts, most often, within the noun phrase *ra:ks kla 'grand chariot', where it is not even certain that the ra:ks in question represents the same 'word'. It is likely that the gradual replacement of *lhu:? with ra:ks did not really start until COC, possibly caused by the expanding polysemy of the former ('road / way / manner / habit / Tao', etc.).
In COC, the simple word M *lhu:? is very rarely employed to denote a physical 'road' by itself — most of the time, it only appears within the compound form MJ *lhui?-ra:ks. On the other hand, J *ra:ks is very common as 'road' on its own, quite unlike its functions in the EOC period. Likewise, in the modern language the basic equivalent for 'road' is either the bisyllabic MJ dao-lu or the monosyllabic J lu, but never the monosyllabic M dao. This fairly transparent shift in usage may count as a lexical replacement, with the original *lhu:? ceding its basic functions to *ra:ks.
B.1.4. 'root': EOC ^ (*pd:r? > ben) ^ COC fg (*ka:n > gen). 0 Although the absolute majority of contexts in which *pa:r? is encountered in EOC are metaphorical ('root' as 'foundation', etc.), at least one context [Shijing 255, 8] clearly refers to pa:r? as 'tree root', opposed to ^ *ke 'branches' and *lhap ^ 'leaves'. The simple pictographic nature of the character also hints at the original semantics of 'tree root'. No other words with this meaning are found in EOC. By contrast, it cannot be doubted that by the end of COC the word ff *kd:n had completely replaced
the earlier *pa:r? in the basic meaning 'root (of trees and other plants)', with *ps:r? preserved in a wide range of figurative meanings ('root' as 'origin', 'foundation', 'essentials', etc.). In the Shuowén jiezi, for instance, all of the references to roots of plants always comprise *ka:n, whereas *pa:r? is reserved for the more abstract meaning 'foundation'.
The difficult problem is to determine the approximate period during which the replacement actually took place. Early Confucian texts offer little help in this matter, since the word 'root' is only encountered in them in figurative meanings (origin', most of the time), thus, only *pd:r? is attested, but none of the attestations are diagnostic. Cf., however, a diagnostic context in the Inner Chapters of Zhuangzí [1, 4, 6], a document of comparable antiquity: ftPM^^tfflfe... WMM^AfK yang ér shi qí xi zhi... fu ér jian qí da gen «he looked up and saw its (the tree's) thin branches... he looked down and saw its big roots». In light of all available evidence, we fill the COC slot with *kd:n. In the modern language, the situation persists (although the root ff gen is typically used in binomial constructions, such as shu-gen ^ff 'tree-root', etc.).
B.2. Replacements from COC to MC.
B.2.1. 'belly': OC H (*puk > fu) ^ MC ([dó] > du). 0 The new word for 'belly' is attested already in the Línji lu: lu niú du lí sheng '/you/ will be born in the belly of a donkey
or a cow'. The new word persists in the modern language, albeit usually extended with the de-semanticized suffix ^ du-zi).
B.2.2. 'burn (tr.)': OC ^ (bn > fén) ^ MC M (*sr¡ew > shao). 0 In EOC, bn is the main word for 'burn' and *srew is not attested at all. The latter appears in COC and gradually replaces the former as the most neutral equivalent for the concept: of note may be the statistical observation that in the Zuozhuan (5th century BC) we observe 42 cases of *bdn vs. no cases at all of *srew, but in the Shíji (1st century BC) we already see just 17 cases of *bdn vs. 58 cases of *srew (sporadically, the compound form ^M *bdn-srew is also observed). In the Línji lu, the equivalent is either the compound form (e.g. fén-shao jing xiang ^MIM^ «to burn writings and images») or the simple M shao (bei huo lái shao MA^M «you will be burned by fire»); the same situation is typical of the modern language. We may tentatively conclude that *bdn was essentially replaced by *srew around Han-era times, i.e. in the interim period between COC and MC.
B.2.3. 'cold': OC ^ (*ga:n > hán) ^ MC ^ (*re:r? > leng). 0 The word *re:r? 'cold' frequently appears in Han-era texts, but not in COC, where *ga:n is still the default equivalent. By MC times, *ga:n is clearly a bound and archaic form (in the Línji lu, it is only encountered in the idiomatic collocation ^^ hán song 'winter pine'), and it remains a bound form in the modern language.
B.2.4. 'eat': OC ^ (*lak > shí) ^ MC ^ (*khe:k > chi). 0 An early colloquialism attested already in the Shuowén jiezi, chi is transparently the neutral equivalent of the meaning 'eat' in the Línji lu (shí and chi are both attested in the text, but only the latter is regularly encountered in direct speech, e.g. yi ri chi duo shao «how much do they eat per one day?»).
B.2.5. 'eye': OC @ (*muk > mu) ^ MC (*rra:n? > yan). 0 The original meaning of the word may have been 'eye-ball' (although already in the Shuowén *rrs:n? is explained as @ *muk 'eye'). In any case, the replacement is quite transparent in the Línji lu, where the old word @ *muk is only encountered in bound expressions such as mu-qían 'present', etc.
B.2.6. 'head': OC ^ (*slu? > shou) ^ MC ^ (*dho: > tóu). 0 This replacement may have already taken place in Hán-era time (in the Shí ji, the word seems to be more frequent than shou, particularly in direct speech).
B.2.7. 'smoke': OC S (*hun > xün) ^ MC M (*?i:n > yán). 0 Available attestations are insufficient to reconstruct a completely reliable picture. The facts so far are as follows: (a) only *hun is attested in EOC; (b) *?i:n is clearly the main equivalent for 'smoke' in all Hán-era and later texts; (c) early Confucian texts of the 5th-6th centuries have no occurrences of 'smoke', but the
word is sometimes encountered in texts such as Mo-zi or Zhuang-zi, albeit more often in the verbal ('to smoke out') than nominal meaning. We tentatively assume that the replacement of the original noun has to be dated to a time period around Early Han, but new data may overturn this assumption.
B.2.8. 'tree': OC ^ (*mho:k > mu) ^ MC M (*dho? > shu). 0 The nature and reasons for this replacement are quite transparent: it begins as a compound form M^ shu-mu, lit. 'planted tree' (where M = M *dho?/s/ 'plant vertically'), well attested already in the Han period. By late MC, the replacement seems to be complete: in the Linji lu, simple M shu is the usual equivalent for 'tree /growing/' (cf. cheng yi zhu da shu ^—^AM «he will become a big tree»), while ^ mu is restricted to the meaning 'wood /material/'. In the modern language, 'tree' is M shu or M^ shu-mu; ^ mu (more frequently, the extended suffixal variant ^K mu-tou) is strictly 'wood'.
B.2.9. 'two': OC ^ (*niys > er) ^ MC M (*rhaq? > liang). 0 This only counts as a replacement if we follow the definition of 'two' as an adjectival lexeme, used in conjunction with a quantified noun; since this is the most common function of numerals, such a definition is, however, fully justified. The replacement process is well traceable across ancient texts. The word *rhay? is not encountered at all as a numeral in EOC texts; is rigidly restricted to paired objects only throughout COC (M^ liang shou 'two hands', MM liang ma 'a pair of horses' etc.); and begins to be freely applied to any objects, paired or not, around Han times. In the Linji lu it is clearly the same default equivalent for 'two /of anything/' as it is in the modern language, e.g. MMM^M yu er liang wen qian 'I give you two coins', etc.
B.2.10. 'go (walk)'5: OC ^ (*waq? > wang) ^ MC ^ (*khas > qu). 0 This replacement is rather tricky and not easily detectable through the corpus, particularly considering the general abundance of verbs denoting directed movement in OC (partial synonyms also include ^ *td 'to go, be headed somewhere', M *tek 'to go', etc.). Nevertheless, it can be more or less ascertained that throughout EOC and COC ^ qu is almost exclusively used in the meaning 'to /take/ leave', and, even more importantly, that the basic antonymous pair 'come and go' is always rendered as ^^ wang-lai rather than ^^ wang-qu. This situation is completely reversed in the language of the Linji lu, where the usual antonym of ^ lai is always ^ qu rather than ^ wang, and remains as such in the modern language.
B.2.11. 'what': OC {{ (*gha:y > he) ^ MC ff^ ([^immua] > shemme). 0 While the old inanimate interrogative pronoun still survives in MC as an archaism or as part of some bound expressions, it is clear that already in the Linji lu the default equivalent is the replacement she-mme, a colloquialism that arose already in post-Han times.
B.3. Replacements from MC to PTH.
B.3.1. 'nail (claw)'6: OC ^ (*cru:? > zhao) ^ PTH zhi-jia. 0 In the Linji lu, the old word ^ zhao still seems to be the default equivalent, cf. fa-mao zhao chi «head hair, body
hair, nails, and teeth». The binome zhi-jia (literally 'finger-shell') is first attested in Songera texts (11-12 cent.).
B.3.2. 'give': OC ^ (*pits > bi) / ^ ~ M (*la? > yu) ^ PTH gei. 0 In EOC, *pits and *la? are basically interchangeable synonyms, cf. two lines in the same Shijing poem (53, 1): he
yi bi zhi vs. he yi yu zhi, both translatable as 'what shall I give him?' Only the latter,
however, survives into COC times, where it becomes the sole neutral equivalent for the re-
5 The meaning 'go' (i.e. the opposite of 'come') is consistently used in the Global Lexicostatistical Database instead of 'walk' (i.e. 'move without a specific direction') in the «classic» Swadesh wordlist, but is still filed alphabetically under 'walk' because of technicalities.
6 The meaning '(finger)nail' (of human) is consistently used in the Global Lexicostatistical Database instead of 'claw' (animal) in the «classic» Swadesh wordlist, but is still filed alphabetically under 'claw' because of technicalities.
quired meaning and persists into MC. PTH |p gei is a more recent replacement (a dialectal phonetic development from MC kip ^ OC *kap, originally 'to provide, furnish').
B.3.3. 'green': OC ^ (*she:q > qing) ^ PTH lu. 0 Both these words are already attested in EOC and persist all the way to the modern language. Our decision is based primarily on diagnostic contexts, such as the application of these qualifiers to specifically green objects (e.g. 'leaves') and their appearance in lists of the most basic color terms. The latter, in particular, allows to assume that ^ qing was still the basic 'green' as late as MC (cf. in the Linji lu: fES^®^, ShM^S ba wo zhuo-di yi, ren qing huang chi bai «he seizes the clothes that I wear, considers them to be green, yellow, red or white»). In the modern language, however, ^ qing has shifted to denote a darker tinge of green, with lu taking its place in the general spectrum.
B.3.4. 'hear': OC H (*mdn > wen) ^ PTH f§M ting-jian. 0 The old word is still the default equivalent for 'hear' in the Linji lu; in the modern language, it is only encountered in bound expressions.
B.3.5. 'mouth': OC □ (*kho:? > kou) ^ PTH ^ zui. 0 The latter word, originally written simply as used to mean 'beak'; the shift to 'mouth' is apparently a very recent development that took place sometime in the late Qing period.
B.3.6. 'red': OC ^ (*khiak > chi) ^ PTH H hong. 0 The latter word is already attested in COC, where it, however, is very rare and most likely denotes some specific shade of red. ^ chi is still the main equivalent for 'red' in the Linji lu (see the example in B.3.2). It is not quite clear at which particular moment the replacement has become complete, but in the modern language ^ chi is no longer in active usage. Other OC words that are typically translated as 'red', e.g. ^ zhu, ^ tong, etc., are statistically less frequent and more commonly found in conjunction with articles of clothing than natural objects.
B.3.7. 'stand': OC ^ (*rap > li) ^ PTH ^ zhan. 0 The older meaning of ^ zhan is 'to stop somewhere; to occupy a place' (originally written as f£). The word gradually replaces the older ^ li in the basic meaning 'to stand' over the Ming-Qing period.
B.3.8. 'sun': OC 0 (*nit > ri) ^ PTH tai-yang. 0 The metaphoric term tai-yang, lit. 'the extreme Yang', is well attested since at least Han times, but only functions as the default term for the celestial body in the modern language.
B.3.9. 'this': OC fcb (*chey? > ci) ^ PTH ^ zhe. 0 There is a certain number of stems that may be used to denote proximal deixis at any given time period in Chinese, but fcb ci is the one link that ties together all these time periods — with the exception of the modern language, where it is only used in idiomatic bound forms, while the common equivalent for 'this' is the replacement ^ zhe. In the Linji lu, both fcb ci and ^ zhe co-exist, but fcb ci is still far more common and cannot be formally regarded as a literary archaism.
B.3.10. 'tooth': OC ® (*thd? > chi) ^ PTH ^ ya. 0 The story here is as follows: (a) in EOC, ® chi = 'teeth /of humans or animals/', ^ ya = '/special/ teeth /of animals only/' (usually tusks, possibly also fangs etc., i.e. protruding teeth; even the graphic shape of the character suggests 'tusks'); (b) in COC, the situation is largely the same, although in a few cases the compound form chi-ya is also attested; (c) in the Linji lu, the usual equivalent is either bisyllabic ya-chi or monosyllabic ® chi, but never monosyllabic ^ ya; (d) conversely, in the modern language, the usual equivalent is either bisyllabic ya-chi or monosyllabic ^ ya, but never monosyllabic ® chi. According to our rules, this indicates a replacement from MC to PTH.
B.4. Unclear due to lack of attestation in MC.
B.4.1. 'dog': OC A (*khwi:n? > quan) ^ PTH ^ gou. 0 Although the word 'dog' is not attested in the Linji lu, it may be reasonably well guessed that ^ gou had already become the primary equivalent for the neutral meaning 'dog' in MC, judging by the steady increase in at-
testation since Han times, by which period the old A quan had largely been demoted to the specialized meaning 'hunting dog = hound'. See Starostin 2013 on the possible semantic differentiation between quan and gou in COC (where gou may have originally denoted a special breed of dogs raised for meat).
B.4.2. 'drink': OC X (*?am? > yin) ^ PTH ^ he. 0 Not attested in the Linji lu at all. The modern equivalent ^ he is only encountered in texts since the Yuan dynasty (13th — 14th centuries), so it may be assumed that the old word was still in colloquial circulation throughout the MC period.
B.4.3. 'egg': COC (*rho:n? > luan) ^ PTH ^ dan. 0 The old word is not attested either in EOC (although the pictographic nature of the character may suggest an archaic origin) or in the Linji lu. The new word is a transparent semantic extension of dan 'ball, pill, bullet, any small round object', a word well attested already in OC and usually written as ^ in its original meaning. The first attestations of the semantic shift come from classic 16th-18th century novels; it may be assumed that the old word luan was still the basic term in MC7.
B.4.4. 'full': OC S (*ley > ying) ^ PTH ^ man. 0 Not attested in the Linji lu. The original meaning of ^ man was likely 'to fill up, overflow (of water)'; it is not found in the generic meaning 'to fill /anything/' or in the adjectival meaning 'full' in early Confucian texts or in the Daodejing, but is already competing with S ying in Zhuangzi. In the Shiji, S ying is encountered 14 times next to 85 for ^ man, meaning that the replacement was likely complete by the early Han period.
Another semantically close morpheme, ^ (*thuy > chong), is first encountered in the Shi-jing as part of the compound noun ^^ chong-er 'ear stopper'; in COC it is usually applied to the process of filling up storage units (granaries, etc.) and also used in various figurative meanings. The bisyllabic compound ^^ chong-man is well attested already in Early Han times and has persisted all the way up to modern times; nevertheless, ^ chong almost always behaves as a secondary morpheme in this formation, and while it is hard to precisely state the semantic difference between chong and man in the COC period (it may have been 'to fill up with hard substances' vs. 'to fill up with liquid substances', as one of the possibilities), including it in our calculations as a secondary synonym or excluding it altogether will have no effect on the overall calculations.
B.4.5. 'neck': OC # (*rhey? > ling) ^ PTH bo-zi. 0 Not attested in the Linji lu. Modern bo-zi is a very late word, not attested earlier than the Yuan dynasty (13th-14th centuries). In addition, a very frequent equivalent for 'neck' in early Han texts is OC ^ *gro:y? (> xiang), whereas ## is more frequently used in the meaning 'collar' by that time. It cannot, however, be confirmed at this time that ^ xiang continued to be the main term for 'neck' throughout MC. Another occasional synonym in COC is I (*dho:s > ddu), always translated as 'neck'; in about 90% of its occurrences in texts, it is used as the object of 'breaking' or 'cutting', implying immediate death, so it is possible that a more exact meaning is something like 'neck vertebra'. In any
7 It is suggested in Baxter, Sagart 2014: 324 that a more archaic equivalent for 'egg' may be a root *thu[n] (= *thun or *thur), not attested in any written Chinese texts but functioning as a vulgar equivalent for 'egg' and/or 'testicles' in some Southern dialects (Cantonese tfan1, Hakka tjnun1); its antiquity is allegedly corroborated by semantically and phonetically perfect Tibeto-Burman parallels. Regardless of whether this hypothesis is correct, it could only be taken into consideration in this paper if we were to assert that this *thu[n], not had the basic meaning ' egg' in EOC, and that somehow Cantonese and Hakka had managed to inherit it, completely bypassing the COC and MC stages. Since the first part of this statement has no confirmation in written evidence and the second is almost impossible to believe, at best we could hypothesize that *thu[n] may have existed in EOC and COC side-by-side with as a «vulgar» synonym, managing to survive into Cantonese and Hakka; but this hypothesis would have no bearing on our lexicostatistics, which requires that only the stylistically neutral equivalents be taken into consideration.
case, it is a statistically infrequent (no more than a couple dozen entries in the entire COC + Hàn corpus, next to hundreds for *rheq? and ^ xiàng) and contextually bound word.
B.4.6. 'that': OC ® (*pay? > bï) ^ PTH W nà ~ nèi. ◊ Not attested in the Lînjï lù, although apparently certain other texts in the yulù genre already show W nà as the basic adjectival stem denoting objects that are far away, while ® bï is more frequently restricted to adverbial functions ( 'there', 'in that place'). On the other hand, cf. B.3.9 'this' where it can be seen that both the old and the new pronoun still co-exist in the Lînjï lù dialect as synonyms; it cannot be excluded that the same situation was symmetrically relevant for the distal deixis pronouns.
C. Unstable items (5 words)
C.1. EOC ^ COC, COC ^ MC.
C.1.1. 'bite': EOC S (*di:t > die) ^ COC l (*dats > shï) ^ MC ^ (*yhra:w? > yao). ◊ The double replacement is quite uncertain8: so far, the only unambiguous EOC context with the verb 'to bite' is a passage in the earliest layer of the Yïjïng: ^SÀ lu hu wei, bù dié rén «if one steps on a tiger's tail, he does not bite». The situation in COC is also far from clear: statistically and contextually, there is some serious competition for l *dats on the part of S (*qhe:t > nié), also encountered several times (Zhuang-zï; Guan-zï) in the meaning 'to bite' (or perhaps 'to gnaw?') as applied to dogs. The distinction between *dats and qhe:t may have originally been dialectal (e. g. «Northern» vs. «Southern»), but it becomes seriously blurred in Hàn times (thus, both terms are interchangeable in the Huâinân-zï). Since MC, however, ^ yao seems to have largely stabilized as the primary equivalent for this meaning.
C.1.2. (?) 'foot': EOC № (*td? > zhï) ^ COC £ (*cok > zu) ^ MC M (*kak > jiao). ◊ The fact that the 'foot' / 'leg' opposition in the earliest stages of Chinese was lexicalized as № (originally written simply as zhï 'foot' vs. j.ii zu 'leg' is suggested, first and foremost, by the early graphical shapes of the characters: f 'foot' vs. ^ 'leg'. Textual evidence is ambiguous at best, since both ' feet' and ' legs' are very rarely attested in EOC, but at least one context in the Shïjïng lîn zhï zhï ' the feet (= hooves) of the lîn) indirectly supports this difference. In COC the old word zhï seems to have shifted its meaning to 'toe', while both 'foot' and 'leg' seem to merge into £ zu for a while — at least until Hàn-era texts, when the differentiation re-emerges with the appearance of a new word for ' foot', M jiao (not attested in EOC at all).
C.1.3. ' sleep': EOC ^ (*miys > mèi) ^ COC ^ (*rjho:ys > wo) or COC ^ (*shim? > qïn) ^ MC ^ (*doys > shuï). ◊ In EOC, ^ *miys is the most common designation of the static meaning ' sleep'; ^ *shim? is more rare and better interpreted as the dynamic 'lie down to sleep', or causative ' put to sleep' (antonymous to ® xïng ' rise'). In COC, ^ *miys is practically nonexistent, whereas ^ *shim? is sometimes found in unambiguously static contexts (e.g. ^^^^ zai yu zhou qïn «Zai Yu slept during the day» [Lùnyu 5, 10]); however, it seems to be competing for the ' sleep' slot with ^ *yho:ys, a word that can be interpreted as 'to lie' or 'to sleep' depending on the context. By Hàn times, the word ^ *doys makes its appearance, and seems to completely eliminate all competition by the beginning of the MC period.
C.2. EOC ^ COC, MC ^ PTH.
C.2.1. ' all': EOC ^ (*srut > shuài) or ^ (*gra:m > xiân) ^ COC ^ (*kra:y > jie) ^ PTH ^ dou. ◊ We equate ' all' with the most commonly used Chinese adverbial adjuncts with the same meaning, typically placed right before the verb. EOC uses a variety of those, making it impos-
8 Laurent Sagart (p.c.) has suggested the possibility of both *di:t and *dats reflecting the same original root, but the vocalism seems to go against this idea; even if this were so, the morphological alternation must have been so ancient that the two forms would hardly feel related in the 1st millennium BC.
sible to choose between *srut and *gra:m. In COC, ^ *krd:y is unquestionably the most widely used adjunct, although by early Han times it begins to compete with the synonymous ^ (*sit > xi); in the Linji lu, *kra:y is still encountered either on its own or in conjunction with *sit (both ^^ xi-jie and ^^ jie-xi are possible). Curiously, modern dou seems to have already existed in its current meaning at least in Han times, but is only very occasionally attested until the modern phase of the language.
C.3. COC ^ MC, MC ^ PTH.
C.3.1. 'say': EOC B (*wat > yue) ^ MC ^ (*wm > yun) ^ PTH ¡^ shuo. 0 We understand 'say' here as the most common verb to introduce direct speech, which makes it easier to single out one particular candidate among a huge variety of verbs denoting various kinds of speech in Chinese. In Old Chinese, this verb has always been B *wat; in the Linji lu, direct speech is usually introduced by ^ *wdn, a verb already well attested in OC as well but nowhere near as common as *wat (its functions in various subperiods and dialects are still somewhat unclear). In colloquial PTH, the functions of these words have been completely overtaken by ¡^ shuo, a word originally meaning 'to explain, interpret'.
D. Unusual deviations
These two cases describe interesting situations where one of the two intermediate attested stages features a variant that is deviant of the common form, so that older and newer forms of the language share the same equivalent but the intermediate equivalent is expressed by a different root.
D.1. 'earth': EOC i (*tha:?) ^ PTH tu vs. MC % (di). 0 The semantic difference between i tu and % di 'earth, ground' is often neutralized in both ancient and modern contexts, most obviously so within the compound formation i% tu-di, well attested already in OC. Nevertheless, whenever the two morphemes are met separately, the former typically refers to 'earth' as substance ('soil' — the required Swadesh meaning) and the latter as surface ('ground', 'territory'). Surprisingly, one glaring exception is the dialect of Linji lu, where it is % di rather than i tu that commonly functions as a substance term, cf.: bei di shui huo feng «suffer earth, water, fire, and wind» (the elements), etc., whereas the word i tu is almost always encountered only within the compound form Hi guo-tu «territory (of state)». It is possible that this usage reflects a genuine case of lexical replacement in the respective dialect, though a specific peculiarity of the literary language is not excluded either.
D.2. 'good': EOC (*hu:?) ^ PTH hao vs. COC # dan? (^ PTH shan). 0 Curiously, the character ^ throughout most of the Classical Chinese period is most often employed to transcribe the derived verbal stem hu:-h 'to love' rather than the original adjectival stem hu:? 'good' (as in EOC); the latter cannot by any means pretend to denote the basic qualitative predicate '(to be) good' in any of the early Confucian texts or, in fact, in any of Classical Chinese up at least to the Han period. Thus, it is a rare (but not unique) isogloss that places EOC closer to post-Classical language than to the Classical epoch. Other quasi-synonyms have been excluded from comparison, such as f^ (*kre > jia) 'beautiful, excellent' (met more rarely and generally in highly expressive contexts), ^ (*ray > liang) 'kind, good-spirited' (usually applied to human or animal nature rather than anything else), etc.
E. Excluded from analysis
E.1. 'lie': This (static) meaning is notoriously hard to separate from the closely related 'lie down, go to sleep' (dynamic) and 'sleep', not only in ancient texts, but in many modern dialec-
tal corpora as well (it is no wonder that it is very frequently omitted from various wordlists published in Chinese linguistic sources). The PTH equivalent is the recent innovation ® tang, of unclear origin; earlier literary sources mostly feature ambiguous data, with such quasisynonyms as ^ qin and ^ wo translatable as 'go to sleep', 'lie down', or 'be sleeping' depending not only on the context, but on the translator's intuition as well. There is no formal ground in this case to speculate on possible lexical replacements in pre-PTH times.
Analysis
Having presented the data in its entirety, we can now proceed to the stage of analysis — a relatively brief one, since our only important task here is to calculate the number of replacements (or, more accurately, discrepancies, since we do not want to assume that each of the four analyzed stages was a direct linguistic descendant of the previous one). As could already be seen from the data, many cases in which such discrepancies were postulated are actually problematic and often derived from indirect evidence, particularly in the case of EOC vs. COC, where the attested corpus does not always allow us to resolve the issue of synonimity to complete satisfaction. For that reason, in the tables below I will discriminate between «certain» and «probable» replacements, where the former are clearly evident from sufficient textual evidence and the latter are based on insufficient and/or circumstantial evidence.
Additionally, in respect to the long transitional period from COC to MC it is useful to log the information on cases where a solid argument may be made for a lexical replacement already evident in Han-era literary texts (despite the lack of a separate wordlist for the Han period); such cases will be marked with a + sign next to the item in question.
Certain replacements Probable replacements
EOC ^ COC 'all', 'road', 'root', 'sleep' 'bite', 'breast (chest)', 'foot', 'good', 'man'
COC ^ MC 'belly', 'bite', 'cold', dog+', 'eat', eye', 'foot+', 'head+', 'say', 'sleep', 'tree', 'two', 'go', 'what' burn+', earth', smoke+', full+', 'neck+', 'that'
MC ^ PTH t lit t -i) t • t (i ) ( .1 ) ( i) t , 1»« ) ( j all, nail, give, hear, mouth, red, stand, sun, say 'green', 'this', 'tooth', 'drink', 'egg'
Adding up both certain and probable replacements, we thus get the following picture:
1) 9 replacements over the approximately 400-500 year period separating EOC from COC;
2) 20 replacements over the approximately 1,200-1,400 year period separating COC from MC (of these, about a third may have taken place over the approximately 300-200 year period separating COC from Hàn-era Chinese, though this number is not fully confirmed);
3) 14 replacements over the approximately 800-1,000 year period separating MC from PTH;
4) altogether, 43 replacements from EOC to PTH (counting twice for those few items that have been replaced two times — 38 otherwise).
Quite importantly, none of the attested replacements can be reliably attributed to external borrowing; although for some of them (especially those that lack reliable Tibeto-Burman cognates) an original non-Chinese source is quite possible, the majority are first attested in texts with non-Swadesh meanings, so the replacements have to be judged as «internal». According to Sergei Starostin's revised methodology of glottochronological calculations, this means that we should expect the rates of change to be reasonably regular, without any periods of intensive speeding-up due to contact-induced processes of lexical intereference.
The results are not convincingly consistent with the division of the Swadesh wordlist into the less stable and more stable sub-sets as described, e.g., in Starostin 2010: although of all the
listed items, slightly less than half belong to the more stable sub-set ( nail', dog', drink', eat', egg', eye', foot', head', hear', mouth', smoke', sun', tooth', tree', two', what'), the proportion is still close to 50/50 and hardly significant. It does seem interesting that nearly all the reliable and potential replacements from EOC to COC fall into the less stable half of the wordlist, but whether this observation is historically important remains to be seen.
Conclusions
1. Taking Early Old Chinese as the starting point and Modern Chinese as the endpoint, we can claim, based on a mix of direct and indirect evidence from the text corpus (and some dictionary information), that approximately 60% of the Swadesh wordlist has been retained over 3,000 years of linguistic evolution. (The rounding-up of the percentage, rather than being an aesthetic concession, should hint at the possibility of errors in data analysis and occasional wrong conclusions based on insufficient data). This figure is not in direct contradiction either with the classic Swadesh formula (t = -ln(0.6) / 0.14 = 3650 years) or with the revised Starostin formula (t = V -ln(0.6) : 0.05^0.6 = 4120 years), though it does obviously fit in better with Swadesh's assessment.
2. The individual replacement rates for the three checkpoints are as follows: = 0.18 for EOC to COC, « 0.14 for COC to MC, « 0.14 for MC to PTH. Other than a slight increase in the first case (which could be explained by different factors, such as incorrect dating, errors in wordlist construction, or a significantly divergent dialectal base for EOC, meaning that the real time difference between it and COC should be higher), the results over different time periods seem to be impressively consistent — and in unexpectedly good agreement with Swadesh's classic lambda value of 0.14 for 1,000 years (rather than Sergei Starostin's 0.05 over the same period).
3. However, these figures may need slight corrections depending on whether we subscribe to the idea that the selected checkpoints are not necessarily in a straightforward ancestral relationship: for instance, the real time distance between MC and PTH may not be the 8001,000 years that separate the text of the Linji lu from today's colloquial Mandarin Chinese, but a period of as much as 1,000-1,400 years (to be more confident, one would have to conduct a very thorough and rigorous dialectal study of the text). In other words, observed lambda values might be slightly inflated (but only slightly: thinking of MC and PTH as two completely independent developments from COC or EOC is not supported by evidence).
4. If there is any circumstantial evidence for a one-time acceleration period, the best candidate would probably be the transition from COC to Han-era texts, where we witness, over a span of no more than 200 years, the replacement of such words as head', neck', foot', dog', and others. However, since the main dialect of Han-era texts is hardly a direct descendant of the Northern (Lu?) dialect that forms the basis for the COC list, it may be argued that at least some of these replacements could have happened earlier and are simply undetected due to lack of textual evidence from that dialect preceding the 3rd century BC (which brings us back to point 3).
5. It is particularly instructive to compare the acquired result with historically similar situations for other written languages, especially those already covered in the Global Lexico-statistical Database (Starostin ed. 2011-2019). Thus, for the Greek language (wordlists compiled and published by Alexei Kassian) we have a wordlist for the Ancient Attic dialect (4th century BC, largely based on the language of Plato), compared with Modern Demotic Greek: the number of lexical replacements is 39 (all of them internal, just like in Chinese), which gives a lambda value of = 0.16, completely in line with our results for Chinese (unfortunately, no high quality wordlists for any forms of Byzantine Greek are as of now available in the GLD).
On the other hand, it is also true that comparison with another Indo-European situation, namely, Old Norse vs. Modern Icelandic, shows a different result: only 2 replacements ( eat', swim') over the approximately 700-800 years that separate the two stages, resulting in a lambda value of = 0.025 (this result basically just repeats the observations already publicized in the well-known anti-glottochronological paper by Bergsland and Vogt, 1962). But what this shows, in my opinion, is not the simplistic «glottochronology does not work» conclusion that is drawn by many researchers, but rather that different rates of replacement may be triggered by different sociolinguistic situations — indeed, it may be argued that historically, the cases of Greek and Chinese have more in common with each other (large dialectal variety; co-existence of an archaic written language with evolving colloquial norms; active contact with neighboring languages) than either of them with Icelandic. Naturally, a full comparative analysis of these situations will only be possible after a detailed analysis of all the empirical evidence that may be gathered from other written languages across the globe (Indo-European, Semitic, Egyptian, etc.); hopefully, the present study takes a small step in the right direction.
References
Baxter, William H., Laurent Sagart. 2014. Old Chinese: A New Reconstruction. Oxford University Press. Bergsland, Knut, Hans Vogt. 1962. On the Validity of Glottochronology. Current Anthropology 3: 115-153. Dobson, W. A. C. H. 1968. The Language of the Book of Songs. University of Toronto Press. Gurevich, Isabella S. 2001. Lin-ji lu. Saint-Petersburg: Peterburgskoje vosotokovedenije.
Hamed, Mahe Ben, Wang Feng. 2006. Stuck in the forest: Trees, networks and Chinese dialects. Diachronica 23(1): 29-60.
Kassian, Alexei, George Starostin, Anna Dybo, Vasily Chernov. 2010. The Swadesh wordlist: an attempt at semantic specification. Journal of Language Relationship 4: 46-89. List, Johann-Mattis. 2015. Network perspectives on Chinese dialect history. Bulletin of Chinese Linguistics 8: 42-67. List, Johann-Mattis. 2016. Beyond cognacy: Historical relations between words and their implication for phyloge-
netic reconstruction. Journal of Language Evolution 1(2): 119-136. Pulleyblank, Edwin G. 1995. Outline of Classical Chinese Grammar. Vancouver: UBC Press.
Sawer, Michael. 1969. Studies in Middle Chinese Grammar: the language of the early yeu luh. PhD Thesis, Australian National University.
Schuessler, Axel. 1987. A Dictionary of Early Zhou Chinese. Honolulu: University of Hawaii Press.
Starostin, George. 2010. Preliminary lexicostatistics as a basis for language classification: a new approach. Journal of
Language Relationship 3: 79-117. Starostin, George. 2013a. Lexicostatistics as a basis for language classification: increasing the pros, reducing the cons. In: H. Fangerau, H. Geisler, Th. Halling, W. Martin (eds.). Classification and Evolution in Biology, Linguistics and the History of Science: Concepts — Methods — Visualization: 125-146. Stuttgart: Franz Steiner Verlag. Starostin, George. 2013b. K probleme dvux sobak v klassicheskom kitajskom jazyke: canis comestibilis vs. canis venaticus? In: N. P. Grintser et al. (eds.). Institutionis Conditori: Ilje Sergeevichu Smirnovu. Orientalia et Classica, vol. L: 253-267. Moscow: RSUH Publishers. Starostin, George (ed.) 2011-2019. The Global Lexicostatistical Database. Moscow: Russian State University for the
Humanities, & Santa Fe: Santa Fe Institute. Available online at: http://starling.rinet.ru/new100. Starostin, Sergei A. 1989. Rekonstruktsiia drevnekitaiskoi fonologicheskoi sistemy [Reconstruction of the Old Chinese
phonological system]. Moscow: Nauka. (In Russian). Starostin, Sergei. 2000. Comparative-historical linguistics and lexicostatistics. In: Colin Renfrew, April McMahon, Larry Trask (eds.). Time Depth in Historical Linguistics: 223-259. McDonald Institute for Archaeological Research, Oxford Publishing Press. Sturgeon, Donald (ed.). 2019. Chinese Text Project. Available online at: https://ctext.org.
Swadesh, Morris. 1952. Lexico-Statistic Dating of Prehistoric Ethnic Contacts: With Special Reference to North
American Indians and Eskimos. Proceedings of the American Philosophical Society 96(4): 452-463. Wu, Jianshe. 2011. The evolution of basic color terms in Chinese. Journal of Chinese linguistics 39(1): 76-122.
Г. С. Старостин. Китайская базисная лексика в диахронической перспективе и ее значимость для лексикостатистики и глоттохронологии
В статье сравниваются относительные скорости замены базисной лексики (представленной стандартным 100-словным списком Сводеша) на протяжении истории развития китайского языка, от раннедревнекитайского (представленного такими текстами, как Книга песен) к классическому древнекитайскому, позднему среднекитайскому (представленному языком памятника Линьцзи лу) и современному китайскому. В первой части статьи последовательно излагается методология составления списков; вторая посвящена детальному обсуждению всех обнаруженных лексических замен. В заключительной части показано, что в среднем скорость распада списка от одного периода к другому меняется незначительно, и что в целом результаты согласуются с классической «константой Сводеша» (0.14 замен за тысячу лет); более того, обнаруживается корреляция и с некоторыми другими аналогичными ситуациями, например, с историей греческого языка, хотя в отдельных случаях (исландский) такой корреляции не наблюдается. Можно надеяться, что дальнейшие исследования такого рода по лексической эволюции языков с длительной письменной историей позволят поместить полученные результаты в более широкий и значимый контекст.
Ключевые слова: история китайского языка, древнекитайский язык, среднекитайский язык, лексикостатистика, глоттохронология, базисная лексика.