Venedica, transliteration of Cyrillic

Unified transliteration system of the Cyrillic script for all Slavic languages

Features

This transliteration system:

doesn't seek to follow the patterns typical of the English orthography;
is aligned with the Latin-scripted Slavic languages, such as Czech, Slovak, Slovene, and to some extent Polish;
fits for Cyrillic of (in the alphabetical order): Belarusian, Bulgarian, Macedonian, Montenegrin, Russian, Serbian, Ukrainian.

Less like English

Following patterns of the English language results in inconsistencies. The notable example is using the Latin letter y as a transliteration for й, ы, -ий, or -ый within a single transliteration system (×Altay, ×Rybinsk, ×Vasily, ×Maly).

Also, the English orthography doesn't possess an established inventory for soft consonants, resulting in the softness in transliterations of the Slavic languages being conveyed inconsistently or ambiguously.

The table below shows the multiple existing approaches to conveying soft consonants and the misrepresentation of certain cases resulting in duplicate forms (highlighted in red) which removes the difference apparent to a speaker of the Slavic languages and contributes to rendering the English-like transliterations suboptimal.

Cyr	×Lat1	×Lat2	×Lat3	×Lat4
сья	sya	sya	s'ya	s'ya
съя	sya	sya	sya	s'ya
ся	sya	sya	sya	sya
сь	s	s'	s'	s'
с	s	s	s	s

Furthermore, mimicking the English orthography while using apostrophes and consonant clusters like zh, shch, kh still ends up in a writing deeply foreign to an English speaker, defeating the purpose of making a transliteration system look familiar to speakers of English.

More like Czech and Polish

Compared to the Cyrillic-based Slavic languages, the Latin-scripted Slavic languages (such as Czech and Polish) have a similar phonetic system. With the well-established patterns for representing most phonetic features typical of the Slavic languages, these languages are the fittest source for a transliteration system of Slavic Cyrillic.

Also, writing in all Slavic languages in a similar and nearly mutually intelligible way should contribute to better understanding between speakers of these languages.

Transliteration rules

As shown below, most Cyrillic letters have a simple one-to-one transliteration, with a few exceptions still following clear long-established patterns.

Czech sibilants

Cyr	Lat	IPA^[a]
ц	c^[b]	[t͡s]
с	s	[s]
з	z	[z]
ч	č	[t͡ʃ, t͡ɕ]
ш	š	[ʃ]
ж	ž	[ʒ]

č, š, ž are the common representations of the sounds corresponding to ч, ш, ж. These letters have been long used in Czech, Slovak, and Slovene.

The diacritic mark, caron, in these characters indicates their affinity with the unmarked consonants, which can be seen for instance in Russian word forms like улица — уличный (c/č), писать — пишу (s/š), возить — вожу (z/ž).

This also means that, in many contexts, typing with a keyboard lacking the accented characters shouldn't render a text completely incomprehensible.

Polish soft acute

Cyr	Lat	IPA^[a]
ь	◌́		goes as an acute over the preceding consonant (◌́), except č, š, ŝ, ž and except Old Slavic
			omitted after č, š, ŝ, ž, except Old Slavic
	ě	[ĕ, ĭ]	in Old Slavic (see also jer and jeŕ)
љ	ĺ	[ʎ, lʲ]
	li		before vowels, except i
	l		before i
њ	ń	[ɲ]
	ni		before vowels, except i
	n		before i
ћ	ć	[tɕ]
	ci		before vowels, except i
	c		before i
с́	ś	[ɕ]
	si		before vowels, except i
	s		before i
з́	ź	[ʑ]
	zi		before vowels, except i
	z		before i
ѓ	d́^[1]	[dʲ]
	di		before vowels, except i
	d		before i
ќ	ć^[1]	[c, tɕ]
	ci		before vowels, except i
	c		before i

^[1] Although graphically based on г which is represented by Latin g, the Cyrillic letter ѓ is transliterated as d́ based on its phonetic value, which makes it less cryptic and more recognizable to speakers of other Slavic languages. For a similar reason, Cyrillic ќ is transliterated as ć.

Mac. раѓање → radianie ~ Bulg. raždane, Serb. rod́ženie, Rus. roždenije;
Mac. ноќ → noć ~ Serb. noć, Bulg. noŝ, Rus. noč. ¶

Consonants are palatalized with the acute accent, unless followed by a vowel. Followed by a vowel, consonants are palatalized by introducing i after the consonant. If the vowel following the consonant is i, the palatalizing i is redundant and dropped. This is similar to the pattern in Polish (ć turns to ci). This pattern is applicable to unpalatalized consonants as well in order to render their palatalized counterparts: n → ń, m → ḿ, r → ŕ, etc. (like in Polish Poznań).

Iotated vowels

Cyr	Lat	IPA^[a]
е	je	[je]	word-initially and after vowels
е	e	[ʲe]	after consonants
є	je	[je]	word-initially and after vowels
є	ie	[ʲe]	after consonants
ѣ	jě	[je, ji, ije]	word-initially and after vowels
ѣ	iě	[ʲe, ʲi, ʲije]	after consonants
ё	jo	[jo]	word-initially and after vowels
	io	[ʲo]	after consonants, except č, š, ŝ, ž
	o	[o]	after č, š, ŝ, ž
ю	ju	[ju]	word-initially and after vowels
	iu	[ʲu]	after consonants, except č, š, ŝ, ž
	u	[u]	after č, š, ŝ, ž
я	ja	[ja]	word-initially and after vowels
	ia	[ʲa]	after consonants, except č, š, ŝ, ž
	a	[a]	after č, š, ŝ, ž
ї	ji	[ji]

The sounds corresponding to the characters я, е, ё, ю are represented as j + vowel in the Slavic languages: ja, je, jo, ju (as in the name Jan) when the [j] sound is actually present: word-initially, after a vowel, after the soft sign ь and the hard sign ъ, after the Cyrillic apostrophe ' (in Belarusian and Ukrainian) (in all these cases, j is in the beginning of a syllable).

After the palatalized (softened) consonants these characters no longer contain the [j] sound and, for this reason, they are represented with ia, iu, io (like in Polish), where the leading i marks the preceding consonant as softened. (Examples in Russian: Jaroslavĺ, Riazań.)

However, Cyrillic е after consonants can stand for both [ʲe] (as in Rus. тесто [tʲestə]) and [e] (as in Rus. тест [test]) which is the reason why ie and e are merged into a single representation e after consonants. In the beginning of a syllable, Cyrillic е is still represented as je.

Cyrillic ѣ (jat́) (non-cursive form: ѣ) is transliterated as iě due to the probable historical pronunciation being close to [ie], and its cursive form resembling a ligature of іь, with the Old Slavic vowel ь represented as ě (see jer and jeŕ) (ѣ → іь → iě).

i, j, y

Cyr	Lat	IPA^[a]
і	i	[i]
і	í	[i(ː), ij]	long i, for [i] before or after a vowel and for ій or іј
и	i	[i]	except Ukrainian
	í	[i(ː), ij]	except Ukrainian: long i, for [i] before or after a vowel and for ий or иј
	y	[ɨ]	in Ukrainian
	ý	[ɨ(ː), ɨj]	in Ukrainian: long y, for ий
ы	y	[ɨ]
ы	ý	[ɨ(ː), ɨj]	long y, for ый
й	j	[j]
й	i	[j]	short i, for [j] after a vowel and not followed by a vowel
ј	j	[j]
ј	i	[j]	short i, for [j] after a vowel and not followed by a vowel
ї	ji	[ji]

The characters í and ý stand for the long vowels in the Czech and Slovak orthographies: dobrý den, Letní stadion, Průmyslový palác.

In the transliteration, i and y are short. y represents [ɨ] (like in Polish and Old Czech), i can be:

[i] not followed or preceded by a vowel (as in Minsk);
[j] after a vowel (as in Altai);
the palatalization (or softening) marker after a consonant before a vowel (as in Riazań).

While í and ý are long. They represent:

[ij] (Rus., Bulg. ий; Bel., Ukr. ій; Serb., Mac. иј) and [ɨj] (Rus., Bel. ый; Ukr. ий) respectively:
- Rus. добрый, Ukr. добрий → translit. dobrý, same as Czech dobrý;
- Rus. летний → translit. letní, same as Czech letní.
[i] and [ɨ] respectively next to other vowels (which is less frequently encountered in the Slavic languages, as in Troíck, in contrast to troika).

The use of the accented í character next to other vowels is also akin to the use of the diaeresis in some other languages to distinguish two standalone sounds from a diphthong represented by the same two characters, like in French naïf, Noël, Citroën, Moët.

Graphically, í and ý can be regarded as merged digraphs of ij and yj, where the trailing j was transformed to a handier superscript stroke.

As shown in the table above, some instances of the [j] sound ([й]) are represented by the letter j (jot) (not y), like in all Latin-scripted Slavic languages.

Jer and jeŕ

Cyr	Lat	IPA^[a]
ъ			omitted, except Bulgarian and Old Slavic
	ǒ	[ɤ̞, ə]	in Bulgarian
	ǒ	[ə, ŭ]	in Old Slavic
ь	◌́		goes as an acute over the preceding consonant (◌́), except č, š, ŝ, ž and except Old Slavic
			omitted after č, š, ŝ, ž, except Old Slavic
	ě	[ĕ, ĭ]	in Old Slavic

The Old Slavic ъ and ь (jer and jeŕ) are transliterated as ǒ and ě respectively (not ŭ and ĭ), which correspond to the modern voiced vowels, о and е, descending from those older ones:

Old Sl. сънъ → sǒnǒ ~ Modern Russian son;
Old Sl. дьнь → děně ~ Modern Russian deń.

(The fact that in the written form of the Ancient Novgorod dialect ъ/о and ь/е were often used interchangeably also reinforces this approach to transliteration.)

Similarly, the Bulgarian ъ is transliterated as ǒ (not ǎ, as in ISO 9). Apart from offering a valid representation in Bulgarian, the transliteration of ъ as ǒ results in a more recognizable spelling when compared to the other cognate languages:

Bulg. вълк → vǒlk ~ Mac., Rus. volk;
Bulg. звън → zvǒn ~ Bel., Rus., Serb. zvon;
Bulg. пълно → pǒlno ~ Mac., Rus. polno.

(See also the note on the letter ѣ.)

Balkan affricates

Cyr	Lat	IPA^[a]
ђ	d́ž	[d͡ʑ]
џ	dž	[d͡ʒ, d͡ʐ]

Other letters

Cyr	Lat	IPA^[a]
б	b	[b]
в	v	[v]
г	g	[g, ɣ, ɦ]^[1]
ґ	ġ	[g]	used when opposed to г realized as [ɣ]/[ɦ]
д	d	[d]
ѕ	dz	[d͡z]
ј	j	[j]
к	k	[k]
л	l	[l]
м	m	[m]
н	n	[n]
п	p	[p]
р	r	[r]
т	t	[t]
ф	f	[f]
х	h^[b]	[x]
щ	ŝ^[2]	[ɕ(ː), ʃt͡ʃ, ʃt]
ѳ	ḟ	[f]
а	a	[a]
і	i	[i]
о	o	[o]
у	u	[u]
ў	w	[w]
э	ê^[3]	[e]
э	e	[e]	word-initially, the circumflex can be dropped

^[1] Similarly to Dutch, [ɣ] is represented by g. Both [ɣ] and [ɦ] are regarded as allophones of [g] and therefore they are represented by the same letter (which should also contribute to the mutual intelligibility across cognate languages). ¶

^[2] ŝ resembles š (see Czech sibilants), just as щ is close to ш, both phonetically and graphically. ¶

^[3] The circumflex in ê helps distinguish words like Rus. mêtr (мэтр) and metr (метр), or Rus. sêr (сэр) and ser (сер). In the beginning of a word, Cyrillic e is transliterated to je (see also iotated vowels), whereas a transliterated э won't have the leading j, rendering the circumflex unnecessary. ¶

Notes

^[a] The IPA column shows an approximate phonetic value of the most common realization of the given character (close to standard, and stressed in case of vowels). ¶

^[b] The letters c and h represent the [t͡s] (ц) and [x] (х) sounds respectively (not ts and kh). c stands for the [t͡s] sound in all Latin-scripted Slavic languages (as in the Polish words ulica, centralny).

The use of the ts and kh digraphs would only be reasonable in a writing system where the standalone letters of c and h are already reserved to represent other sounds (as in English and French). Some transliteration systems employing these digraphs leave the standalone c and h unused (which is odd on its own), losing the c/č affinity and being conspicuously unlike the Latin-scripted Slavic languages. ¶

Stress mark

The use of the acute accent ◌́ as a stress mark becomes ambiguous in writing systems employing the acute for other purposes, like in this transliteration system. The ambiguity of the acute seems to be easily resolvable by using the underline as a stress mark proving, in fact, to be more suitable for this role.

The underline as a stress mark:

is immediately perceived as an emphasis mark;
doesn't conflict with most diacritics;
can be used to put a stress on a vowel sound represented by more than one letter (Vancouver, Montreux) or on an entire stressed syllable;
fits for stressed syllabic consonants (like in the Czech river name Vltava);
fits for most writing systems, including consonantal abjads (like Arabic and Hebrew);
and, being part of the markup, doesn't introduce an extra character into the text.

Web apps

See Russian translit app and Transliteration of Russian proper names.