Sunday, December 11, 2016

ConLing - Why are so many Chinese logograms pronounced the same?

Anyone who's had any exposure to written Chinese has noticed how many characters have exactly the same pronunciation, or even heard the commonly thrown-around factoid that Chinese is full of homophones. Whether or not modern spoken Chinese languages are actually so unusually full of homophones is debatable, but it's certainly true that a huge number of Chinese characters are homophonic. For instance, the MDGB dictionary lists over a hundred characters with the pronunciation yì [î]. So why on earth is Chinese writing carrying so much information that has nothing to do with encoding the spoken language? It turns out that it can all be traced back to factors emerging from the history and nature of the writing system itself.

Evolution of a pictographic character meaning "sun"

There's an interesting conception I encounter a lot among speakers of Mandarin - that historic writers like Laozi and Li Bai pronounced their characters identically to modern Beijing Mandarin, with all other dialects and variations descended from this timeless, standard pronunciation. Of course, this is patently false.

Characters inscribed on a Shang dynasty oracle bone, a very early example of Chinese logograms

If they are considered to begin with oracle bone inscriptions, Chinese logograms have been in use for 3000 years - a very long time for variations and changes to accrue. Greek, a language with a similar timescale of continued literary tradition, has changed immensely from the time of Homer to the present; no language could go that long without changing. Furthermore, we have plenty of evidence of the changes that have occurred. Many Chinese characters are formed by punning, combining a character of similar pronunciation with a radical to hint at the meaning, and many of those puns only make sense with pre-modern pronunciation. Historical linguists using the comparative method with this type of evidence have been able to reconstruct pronunciation of older forms of Chinese.

The Qieyun is a 7th century "rime dictionary" which has immensely helped modern scholarship reconstruct Middle Chinese

Classical Chinese is famously concise. If Old or Middle Chinese is read aloud according to the pronunciation of Mandarin or any other modern dialect, it's incomprehensible - older forms of the language very often use single characters where modern words have to use two or three (which is why I dispute the notion that Chinese languages are riddled with homophones - multisyllabic words rarely have homophones, even if the characters composing them do). If older forms of the language had as many homophonic characters as we do now, those one-character words would have rendered the entire spoken language hopelessly brief and incomprehensible.

Excerpted from the Shijing, a beautiful Zhou dynasty poem which is utterly incomprehensible if read aloud

As it turns out, reconstructed character pronunciations are longer and more complex in Old Chinese than in modern descendants. This is standard behavior of languages: Spanish hoy and Old French hui (as in aujourd'hui) are both descended from the wordier Latin phrase hōc diē. Spanish and French have the benefit of using alphabets, so they can shorten their spelling of the word as pronunciation is worn down. With Chinese logograms, though, there has been no such way to shorten words.

Let's look at an example. In Old Chinese, speakers could just say 今 [krəm] to mean today - no other word sounded alike, so the meaning was perfectly clear. But as time wore on, Mandarin speakers began to pronounce 今 (today), 金 (gold), and 巾 (cloth) all alike as jīn [t͡ɕín]; lest their listener get confused, they started combining 今 (today) + 天 (day) for clarification, in a move surprisingly similar to what modern French speakers have come up with (au jour d'hui = on the day of today). Thus the current situation is born - 今 has at least a dozen homophonic characters, and modern Mandarin now uses two characters for "today" where Old Chinese used one.

As a last point, it's worth noting that really comprehensive Chinese character dictionaries don't just contain characters used in modern Mandarin - they are full of old, unused characters like 銰, a character in the MDBG database with no known definition, too uncommon to appear on the largest character frequency database I know of. So when I say that 100+ characters are pronounced [î], that's including plenty of characters that are ancient scribal abbreviations or one-off inventions.

So in short, yes - many characters are pronounced alike, but they didn't start their life they way, and modern Chinese speakers have since created thousands of compound words to compensate. Chinese isn't really so crazy after all - a unique writing system and three millennia of literacy have just provided ample opportunity for funny myths to arise.

Further reading:
Ancient Scripts
Oracle bone script
Baxter-Sagart reconstruction of Old Chinese
Classical Chinese
Middle Chinese
MDBG entries for yi4
Character frequency list

No comments:

Post a Comment