TeX-Hyphen-Pattern

 view release on metacpan or  search on metacpan

lib/TeX/Hyphen/Pattern/Bg.pm  view on Meta::CPAN

newspaper column it will be appropriate to use a more liberal version.
It should be noted that some specialised English dictionaries also
separate the word-division positions into two categories – preferred
positions and less recommended positions.

There are two methods to determine the optimal division within a
sequence of consonants between two vowels:

* we can hyphenate according to the syllables in the word or
* we can hyphenate morphologically.

Hyphenation according to the syllables in the word
--------------------------------------------------

Let us look at the properties of the Bulgarian syllables.  All
syllables have the following structure:

> onset - nucleus - code

The nucleus in Bulgarian is always a vowel.  Both the onset and the
code are (possibly empty) sequences of consonants.

The Bulgarian syllables adhere to the Sonority Sequencing Principle.
According to this principle, the consonants within the onset have
raising sonority and the consonants within the code have decreasing
sonority.

Several grammar books agree that the following sonority scale is valid
for Bulgarian:

> voiceless obtrusive < voiced obtrusive < sonorant consonant < vowel

According to the investigations of the author, the only exception to
this law is due to the letter в /v/ which is a voiced obtrusive but it
can be used also as a voiceless obtrusive.  This exception is due to a
spelling particularity of the Bulgarian language.  Whenever the letter
в /v/ seemingly violates the Sonority Sequencing Principle, in the
spoken language this letter is read as Ñ„ /f/, that is as a voiceless
obtrusive (for example the word отвсякъде /otvsyakade/ is read as
отфсякъде /otfsyakade/).[^18]

[^18]: No Primitive Slavonic word contains the phoneme Ñ„ /f/.
Therefore, we can safely assume that in the Primitive Slavonic
language the consonant Ñ„ /f/ was a positional variant of the consonant
в /v/.

The author has found that the sonorant consonants in Bulgarian have
their own sonority scale:

> м /m/ < н /n/ < л /l/ < р /r/ < й /y/

Only a few words such as жанр /zhanr/ and химн /himn/ violate this
scale.  Such words are always loan-words and their pronunciation is
somewhat problematic for the native Bulgarian speakers.

In addition to the Sonority Sequencing Principle, the consonant
clusters within the Bulgarian syllable adhere to the following
additional principles:

1. Both in the onset and in the code, the labial and dorsal plosives
   precede the coronal plosives and affricates.
2. If the onset or the code contains two plosives or affricates, then
   there are no fricatives between them.  Few words with the Latin
   root 'text' are exceptions: контекст /kontekst/.
3. If the onset or the code contains two fricatives other than в /v/,
   then there are no plosives or affricates between them.
4. If the onset or the code contains two plosives or affricates, then
   they both have equal sonority (both are voiced, or both are
   voiceless).
5. If the onset or the code contains two fricatives other than в /v/,
   then they both have equal sonority (both are voiced, or both are
   voiceless).
6. Neither the onset, nor the code may contain two labial plosives, or
   two coronal plosives or affricates or two dorsal plosives.
7. Neither the onset, nor the code may contain two equal consonants
   with the exception of в /v/ (for example втвърди /vtvardi/).[^19]

[^19]: Actually, the letter в /v/ is not a real exception because in
all such cases this letter denotes two different consonants – в /v/
and ф /f/.  Only in the Russian loan-word взвод /vzvod/ the two
letters в /v/ denote a repeating consonant в /v/.

From all these properties of the Bulgarian syllable we can deduce the
following hyphenation rules:

1. In a sequence МК where М is a consonant with higher sonority than
   K, we are not permitted to hyphenate before М.  Exception: when М
   is в /v/ and К is a voiceless consonant.
2. In a sequence КМ where М is a consonant with higher sonority than
   K, we are not permitted to hyphenate after М.
3. In a sequence KBT where K and T are plosives or affricates and B is
   fricative, we separate K from T.
4. In a sequence CKB where K is a plosive or affricate and C and B are
   fricatives other than в /v/, we separate C from B.
5. If in a consonant sequence a coronal plosive or affricate Т is
   followed by a labial or dorsal plosive К, then we separate Т from К.
6. If a consonant sequence contains two plosives or affricates, one
   voiced and one voiceless, then we separate them.
7. If a consonant sequence contains two fricatives other than в /v/,
   one voiced and one voiceless, then we separate them.
8. If a consonant sequence contains two labial plosives or two coronal
   plosives or affricates or two dorsal plosives then they are
   separated.
9. If a consonant sequence contains two equal consonants (not
   necessarily consecutive), then they are separated.

With so many prohibitive rules, a question arises: if we apply all
these rules, aren't we going to eliminate too many hyphenation
possibilities?  The answer is no.  It can be demonstrated that between
any two consecutive syllables at least one separation point will be
permitted.


Hyphenation according to the morphology
---------------------------------------

Between 1983 and 2012 the official orthographic rules of the
Bulgarian language forbade morphologically based hyphenation.  After
2012 such hyphenation is permitted (but not obligatory).

The most important case when it is very desirable to use
morphologically based hyphenation is the case of the compound words.
Divisions such as авток-луб /avtok-lub/ and вакуу-мапарат
/vakuu-maparat/ are extremely irritating even if they are formally
correct.  Unfortunately, we do not have a vocabulary of the compound
Bulgarian words that would permit us to produce rules for automated
hyphenation.  Therefore, the current Bulgarian hyphenation patterns do
not attempt to apply morphological hyphenation to such words.

Second in importance (but far more significant in terms of numbers) is
the case with the word prefixes.  While the eyes of the reader still
look at the start of the word, the word is still unknown to him.  At
this point, it is very important not to deceive his expectations.  For
example, when the reader sees над- /nad-/ at the end of the line, he
will expect that this is the prefix над- /nad-/ with semantics 'attain
more than'.  This expectation will be fooled if this wasn't really a
prefix, but a deceiving (while formally correct) hyphenation of the
word надремя /nadremya/ 'have dozed enough' where the real prefix is
not над- /nad-/ but на- /na-/ with semantics 'achieve a state after
accumulation'.  Such hyphenation distracts the reader and makes the
reading more difficult.

Third in importance is the case with the word suffixes.  With respect
to the hyphenation rules we can divide the suffixes into three
categories:

1. Suffixes starting with a vowel, for example -ар /-ar/.  It is not
   appropriate to follow the morphology with such suffixes because
   this will contradict the whole hyphenation tradition of the
   Bulgarian language.  For example крав-ар /krav-ar/ is unwarranted.
2. Suffixes starting with one consonant, for example -ка /-ka/.
   Usually with such suffixes the syllable boundary in the word
   coincides with morpheme boundary so no specific cares are
   necessary, for example кравар-ка /kravar-ka/.  The exceptions are
   rare, for example: обек-тната /obek-tnata/ instead of обект-ната
   /obekt-nata/.
3. Suffixes starting with more than one consonant (-ски /-ski/, -ство
   /-stvo/).  It is possible to use morphological hyphenation rules
   with such suffixes.

Even if it is possible to use morphological hyphenation with the



( run in 1.177 second using v1.01-cache-2.11-cpan-75ffa21a3d4 )