Unifoundry.com

Hangul (Hangeul) Fonts


Home
Unicode Tutorial
Unicode Utilities
Unifont Glyphs
Hangul Fonts
OS Packages

Hangul (Hangeul)

Hunmin Jeongeum Hangul is the national script of Korea. The Hangul script was developed by King Sejong the Great in three years, from 1443–1446. He introduced his writing system to the nation in the autumn of 1446 with the publication of Hunmin Jeong-eum Haerye, or Explanations and Examples of Correct Sounds for the Instruction of the People (shown at left). He developed Hangul so that the comman man could read and write, amid much opposition by the scholars of the day. The introduction to this work described his purpose:

"Because the national language sounds different from that of China, it [the spoken language] doesn't match the [Chinese] accents. Therefore, when the ignorant want to communicate, many of them cannot achieve their intentions. Because I am saddened by this, I have newly made 28 letters. It is my intention that everybody learn the letters easily so that they can conveniently use them everyday." (Source: Wikipedia)

Hangul is written as syllabic blocks. Each block consists of up to three parts (called jamo) written in order:

hangeul
  1. choseong (Leading Consonants or Syllable-initial Characters)
  2. jungseong (Vowels or Middle Syllable-peak Characters)
  3. jongseong (Trailing Consonants or Syllable-final Characters)

The example above shows the word "hangeul" (Hangul) adapted from an image on Wikipedia.org. The first syllable is "han"; its leading consonant is "h", followed by middle vowel "a", followed by trailing consonant "n". The second syllable is "geul"; its leading consonant is "g/k", followed by middle vowel "eu", followed by trailing consonant "l". The common Romanized spelling of this word is "Hangul".

Notice that vowels, semivowels ("y" and "w") and dipthongs (blue in the above example) can appear to the right of an initial consonant or underneath an initial consonant. Some are written both below and to the right of the initial consonant.

Hangul appears in several Unicode ranges:

Handling the juxtaposition of initial and final consonants in relation to the middle vowels and dipthongs within a syllable is the key to fine Hangul font design. This is even more challenging in a bitmapped screen font. The Hangul Syllables range allows for pre-composed syllable glyphs, improving Hangul rendering with even a bitmapped font stored in Unix BDF format.

The standard set of Hangul fonts used with X Windows on Unix systems has historically been the Johab font set. These fonts are primarily used with the xterm-based "Hanterm" program.

X11 Johab Encoded Fonts

A set of free Unix BDF Johab fonts were developed for displaying Hangul under X11. They are primarily used with Jaekyung "Jake" Song's Hanterm terminal emulator. Jungshik Shin then wrote a Perl script, johab2ucs2.pl, to convert the Hangul set of Johab Encoded Fonts into Roman Czyborra's .hex format.

I made a couple of bug fixes to the original Perl script and added a bunch of comments to help figure out how it worked. Here's a link to the modified script:

johab2ucs2

I then used this script to convert the four Hangul Johab Encoded fonts into Unicode Hangul Syllables in the U+AC00..U+D7A3 range in .hex format. Then I used Roman's hex2bdf script to convert those .hex files into BDF files. Here are the gzipped versions of the resulting BDF files, with all syllables in the Hangul Syllables range:

The original unifont.hex file seems to have used iyagi16 for its syllables. Roman mentioned wanting to switch to a thin stroke font someday. I've changed the Hangul Syllables in the default unifont.hex file to the thin stroke johabg16 glyphs. They seem easier to read on a screen.

I haven't been able to find any documentation in English describing the format of Johab Encoded Hangul fonts, so I decided to take a stab at documenting the format myself. Feel free to let me know of any errors.

The following description is based upon Jungshik Shin's Perl script to convert Johab fonts into the Unicode Hangul Syllables range (U+AC00..U+D7A3) in the GNU Unifont .hex format. Index arrays within the script are described in context. Arrays from the Perl script appear in boldface for easy identification.

The Johab fonts that contain Hangul store multiple versions of each Leading, Middle, or Trailing glyph so an entire syllable block will look its best. The Hangul Johab Encoded fonts contain 529 code points arranged as follows:

The first 19 Leading letters, each with 10 variations, comprise the basic modern set of initial Hangul consonants. These are followed by 12 archaic letters, each with 10 variations. The variation numbers based upon the middle letter that follows are stored in @lconMap1 and @lconMap2 arrays. In general, the variations are as follows (letters in red are not part of the standard modern set and are my best guesses):

The first 21 Middle letters, each with 3 or 4 variations, comprise the basic modern set of Hangul vowels, semivowels ("y" and "w"), and dipthongs. These are followed by 8 archaic middle letters, each with 3 variations.

Each of the middle letters has an associated @vowType (Vowel Type) value; usually this is 0, but for letters with a horizontal long stroke on the bottom of the middle region (O, WA, WAE, OE, EU, YI), the value is 1.

Of the archaic dipthongs, the @vowType property for YU-YEO, YU-YE, and YU-I should therefore be set to 0, and for YO-YA, YO-YAE, YO-I, araea, and araea-i should be set to 1.

The three or four variations for each vowel are used as follows:

The first 27 Trailing letters, each with 4 variations, comprise the basic modern set of Hangul trailing letters. These are followed by 4 archaic Trailing letters, each with 4 variations.

The Trailing letters each have an entry in the @tconType array. These entries indicate which variation of a Middle Letter should be selected. The values are: 0 for the filler, 2 for "N", and 1 for all others. None of the archaic Trailing letters have an open top like "N", so they should also all have a value of 1.

If the variation of a vowel indicated in @tconType is 1, then a further check is made. In such a case, if the leading consonant is "G" or "K" (and therefore has an open bottom), the first or third form is used (depending on whether or not there is a final consonant, respectively). Otherwise, the second or fourth form is used (depending on whether or not there is a final consonant, respectively).

This is why some Middle letters have four types: if @tconType is 1 for the Middle letter, then the corresponding vowel is shaped similarly to "O" and must have four forms. Then the first and third forms will be used when the vowel appears after an initial "G" or "K", so the short vertical stem(s) on the long horizontal stroke reach higher and possibly overlap with the bottom of the "G" or "K".

Of the four variations, the appearance of first three in the font files appear identical, so selection of a value of 0, 1, or 2 might not be critical. The left edge of the fourth variation is always shifted to the left at least one pixel (usually two pixels), and the right edge is shifted left from zero to two pixels. The values in red below are my guesses of what variations suit the archaic Terminal (final) consonants. If anyone knows better, please let me know.

Unicode Hangul Jamo (U+1100–U+11FF)

The Hangul Jamo Unicode Block contains the possible Leading, Middle, and Trailing parts of a Hangul block in the following ranges:

Unicode Hangul Syllables (U+AC00–U+D7A3)

The Unicode Hangul Syllables block of 11,172 complex glyphs forms the main part of the Hangul code points in Unicode. This block contains every possible combination of the 18 Leading Consonants or 1 Leading Filler (19 possible initial letters), 21 Middle Vowels and Dipthongs, and 27 Trailing Consonants (or no Trailing Consonant).

If a Korean word is pronounced as beginning with a vowel, it is written with choseong ieung (NG) in the Leading Consonant position. The "NG" is silent in the initial position of a Hangul syllable.

There are 19 × 21 = 399 possible combinations of Leading Consonants and Middle Vowels and Dipthongs. Each of these can appear as a pair, or with any of the 27 Trailing Consonant combinations, for a total of 28 possible combinations each. Thus there are 399 × 28 = 11,172 glyphs total possible from 19 Leading glyphs, 21 Middle glyphs, and 27 Trailing glyphs.

The glyphs in the GNU Unifont Hangul Syllables range were initially derived from a Johab Encoded font that used thick strokes, iyagi16. This has been replaced in the unifont.hex files by glyphs derived from a Johab Encoded font that uses thin strokes, johabg16.

Roman Czyborra, the creator of the GNU Unifont, had mentioned on his website wanting to change the Hangul glyphs to thin strokes on his list of things to do. The thin strokes are more legible on a screen.