TeX-Hyphen-Pattern
view release on metacpan or search on metacpan
lib/TeX/Hyphen/Pattern/Bg.pm view on Meta::CPAN
newspaper column it will be appropriate to use a more liberal version.
It should be noted that some specialised English dictionaries also
separate the word-division positions into two categories â preferred
positions and less recommended positions.
There are two methods to determine the optimal division within a
sequence of consonants between two vowels:
* we can hyphenate according to the syllables in the word or
* we can hyphenate morphologically.
Hyphenation according to the syllables in the word
--------------------------------------------------
Let us look at the properties of the Bulgarian syllables. All
syllables have the following structure:
> onset - nucleus - code
The nucleus in Bulgarian is always a vowel. Both the onset and the
code are (possibly empty) sequences of consonants.
The Bulgarian syllables adhere to the Sonority Sequencing Principle.
According to this principle, the consonants within the onset have
raising sonority and the consonants within the code have decreasing
sonority.
Several grammar books agree that the following sonority scale is valid
for Bulgarian:
> voiceless obtrusive < voiced obtrusive < sonorant consonant < vowel
According to the investigations of the author, the only exception to
this law is due to the letter в /v/ which is a voiced obtrusive but it
can be used also as a voiceless obtrusive. This exception is due to a
spelling particularity of the Bulgarian language. Whenever the letter
в /v/ seemingly violates the Sonority Sequencing Principle, in the
spoken language this letter is read as Ñ /f/, that is as a voiceless
obtrusive (for example the word оÑвÑÑкÑде /otvsyakade/ is read as
оÑÑÑÑкÑде /otfsyakade/).[^18]
[^18]: No Primitive Slavonic word contains the phoneme Ñ /f/.
Therefore, we can safely assume that in the Primitive Slavonic
language the consonant Ñ /f/ was a positional variant of the consonant
в /v/.
The author has found that the sonorant consonants in Bulgarian have
their own sonority scale:
> м /m/ < н /n/ < л /l/ < Ñ /r/ < й /y/
Only a few words such as Ð¶Ð°Ð½Ñ /zhanr/ and Ñ
имн /himn/ violate this
scale. Such words are always loan-words and their pronunciation is
somewhat problematic for the native Bulgarian speakers.
In addition to the Sonority Sequencing Principle, the consonant
clusters within the Bulgarian syllable adhere to the following
additional principles:
1. Both in the onset and in the code, the labial and dorsal plosives
precede the coronal plosives and affricates.
2. If the onset or the code contains two plosives or affricates, then
there are no fricatives between them. Few words with the Latin
root 'text' are exceptions: конÑекÑÑ /kontekst/.
3. If the onset or the code contains two fricatives other than в /v/,
then there are no plosives or affricates between them.
4. If the onset or the code contains two plosives or affricates, then
they both have equal sonority (both are voiced, or both are
voiceless).
5. If the onset or the code contains two fricatives other than в /v/,
then they both have equal sonority (both are voiced, or both are
voiceless).
6. Neither the onset, nor the code may contain two labial plosives, or
two coronal plosives or affricates or two dorsal plosives.
7. Neither the onset, nor the code may contain two equal consonants
with the exception of в /v/ (for example вÑвÑÑди /vtvardi/).[^19]
[^19]: Actually, the letter в /v/ is not a real exception because in
all such cases this letter denotes two different consonants â в /v/
and Ñ /f/. Only in the Russian loan-word взвод /vzvod/ the two
letters в /v/ denote a repeating consonant в /v/.
From all these properties of the Bulgarian syllable we can deduce the
following hyphenation rules:
1. In a sequence ÐÐ where Ð is a consonant with higher sonority than
K, we are not permitted to hyphenate before Ð. Exception: when Ð
is в /v/ and Рis a voiceless consonant.
2. In a sequence ÐÐ where Ð is a consonant with higher sonority than
K, we are not permitted to hyphenate after Ð.
3. In a sequence KBT where K and T are plosives or affricates and B is
fricative, we separate K from T.
4. In a sequence CKB where K is a plosive or affricate and C and B are
fricatives other than в /v/, we separate C from B.
5. If in a consonant sequence a coronal plosive or affricate Т is
followed by a labial or dorsal plosive Ð, then we separate Т from Ð.
6. If a consonant sequence contains two plosives or affricates, one
voiced and one voiceless, then we separate them.
7. If a consonant sequence contains two fricatives other than в /v/,
one voiced and one voiceless, then we separate them.
8. If a consonant sequence contains two labial plosives or two coronal
plosives or affricates or two dorsal plosives then they are
separated.
9. If a consonant sequence contains two equal consonants (not
necessarily consecutive), then they are separated.
With so many prohibitive rules, a question arises: if we apply all
these rules, aren't we going to eliminate too many hyphenation
possibilities? The answer is no. It can be demonstrated that between
any two consecutive syllables at least one separation point will be
permitted.
Hyphenation according to the morphology
---------------------------------------
Between 1983 and 2012 the official orthographic rules of the
Bulgarian language forbade morphologically based hyphenation. After
2012 such hyphenation is permitted (but not obligatory).
The most important case when it is very desirable to use
morphologically based hyphenation is the case of the compound words.
Divisions such as авÑок-лÑб /avtok-lub/ and вакÑÑ-мапаÑаÑ
/vakuu-maparat/ are extremely irritating even if they are formally
correct. Unfortunately, we do not have a vocabulary of the compound
Bulgarian words that would permit us to produce rules for automated
hyphenation. Therefore, the current Bulgarian hyphenation patterns do
not attempt to apply morphological hyphenation to such words.
Second in importance (but far more significant in terms of numbers) is
the case with the word prefixes. While the eyes of the reader still
look at the start of the word, the word is still unknown to him. At
this point, it is very important not to deceive his expectations. For
example, when the reader sees над- /nad-/ at the end of the line, he
will expect that this is the prefix над- /nad-/ with semantics 'attain
more than'. This expectation will be fooled if this wasn't really a
prefix, but a deceiving (while formally correct) hyphenation of the
word надÑÐµÐ¼Ñ /nadremya/ 'have dozed enough' where the real prefix is
not над- /nad-/ but на- /na-/ with semantics 'achieve a state after
accumulation'. Such hyphenation distracts the reader and makes the
reading more difficult.
Third in importance is the case with the word suffixes. With respect
to the hyphenation rules we can divide the suffixes into three
categories:
1. Suffixes starting with a vowel, for example -Ð°Ñ /-ar/. It is not
appropriate to follow the morphology with such suffixes because
this will contradict the whole hyphenation tradition of the
Bulgarian language. For example кÑав-Ð°Ñ /krav-ar/ is unwarranted.
2. Suffixes starting with one consonant, for example -ка /-ka/.
Usually with such suffixes the syllable boundary in the word
coincides with morpheme boundary so no specific cares are
necessary, for example кÑаваÑ-ка /kravar-ka/. The exceptions are
rare, for example: обек-ÑнаÑа /obek-tnata/ instead of обекÑ-наÑа
/obekt-nata/.
3. Suffixes starting with more than one consonant (-Ñки /-ski/, -ÑÑво
/-stvo/). It is possible to use morphological hyphenation rules
with such suffixes.
Even if it is possible to use morphological hyphenation with the
( run in 1.177 second using v1.01-cache-2.11-cpan-75ffa21a3d4 )