Hangul (Hangeul)
Hangul is the national script of Korea.
The Hangul script was developed by King Sejong the Great in three years, from
1443–1446. He introduced his writing system to the nation in the autumn of
1446 with the publication of Hunmin Jeong-eum Haerye, or
Explanations and Examples of Correct Sounds for the Instruction of the People
(shown at left).
He developed Hangul so that the comman man could read and write, amid much opposition
by the scholars of the day. The introduction to this work described his purpose:
"Because the national language sounds different from that of China, it [the spoken language] doesn't match the [Chinese] accents. Therefore, when the ignorant want to communicate, many of them cannot achieve their intentions. Because I am saddened by this, I have newly made 28 letters. It is my intention that everybody learn the letters easily so that they can conveniently use them every day." [Source: Wikipedia]
Hangul is written as syllabic blocks [see illustration below; source: Wikipedia]. Each block consists of up to three parts (called jamo) written in order:
![]() |
|
The example above shows the word "hangeul" (Hangul) adapted from an image on Wikipedia.org. The first syllable is "han"; its leading consonant is "h", followed by middle vowel "a", followed by Final consonant "n". The second syllable is "geul"; its leading consonant is "g/k", followed by middle vowel "eu", followed by Final consonant "l". The common Romanized spelling of this word is "Hangul".
Notice that vowels, semivowels ("y" and "w") and dipthongs (blue in the above example) can appear to the right of an initial consonant or underneath an initial consonant. Some are written both below and to the right of the initial consonant.
Hangul appears in several Unicode ranges:
- U+1100..U+11FF: Hangul Jamo (all initial combinations (jamo), all middle combinations, and all terminal combinations, both modern and ancient)
- U+3130..U+318F: Hangul Compatibility Jamo (spacing and non-conjoining glyphs for compatibility with KS X 1001:1998)
- U+A960..U+A97F: Hangul Jamo Extended–A (added in Unicode 5.2)
- U+AC00..U+D7A3: Hangul Syllables (complete collection of common modern syllables)
- U+D7B0..U+D7FF: Hangul Jamo Extended–B (added in Unicode 5.2)
- U+FFA0..U+FFDF: Hangul Compatibility Jamo (half-width forms)
Handling the juxtaposition of initial and final consonants in relation to the middle vowels and dipthongs within a syllable is the key to fine Hangul font design. This is even more challenging in a bitmapped screen font. The Hangul Syllables range allows for pre-composed syllable glyphs, improving Hangul rendering with even a bitmapped font stored in Unix BDF format.
The standard set of Hangul fonts used with the X Window System on Unix systems has historically been the Johab font set. These fonts are primarily used with the xterm-based "Hanterm" program. It turns out that the Johab glyphs that were originally in Unifont were found to only be licensed for use in Hanterm, and so those glyphs were removed from Unifont and replaced with alternate glyphs. More on this below.
X11 Johab Encoded Fonts
For an (extremely!) detailed discussion of Hangul font encodings, see the supplementary page Generating Hangul Syllables. In it, I describe various Korean font encodings in more detail than anything else I've found in English. The main focus is describing the steps I followed to draw a new set of Hangul Syllables glyphs (U+AC00..U+D7A3) to address non-GPL licensing issues with older fonts. I wrote notes on the process for my own sake while developing new glyphs because I could not find all the information anywhere in English. I hope it will provide a reference for others working with old or new Hangul fonts and encodings.
Long before my involvement with Hangul fonts, a set of free Unix BDF Johab fonts were developed for displaying Hangul under X11. They are primarily used with Jaekyung "Jake" Song's Hanterm terminal emulator. Jungshik Shin then wrote a Perl script, johab2ucs2.pl, to convert the Hangul set of Johab Encoded Fonts into Roman Czyborra's .hex format.
I made a couple of bug fixes to the original Perl script and added a bunch of comments to help figure out how it worked. Here's a link to the modified script:
I then used this script to convert the four Hangul Johab Encoded fonts into Unicode Hangul Syllables in the U+AC00..U+D7A3 range in .hex format. Then I used Roman's hex2bdf script to convert those .hex files into BDF files. Here are the gzipped versions of the resulting BDF files, with all syllables in the Hangul Syllables range:
- uni-iyagi16.bdf.gz (thick strokes with large glyphs)
- uni-johabg16.bdf.gz (thin strokes)
- uni-johabm16.bdf.gz (thick strokes with smaller glyphs)
- uni-johabp16.bdf.gz (thick strokes with smaller glyphs, more stylized)
The original unifont.hex file appears to have used iyagi16 for its syllables. Roman mentioned wanting to switch to a thin stroke font someday. For the Unifont 5.1 release, I changed the Hangul Syllables in the default unifont.hex file to the thin stroke johabg16 glyphs. They seem easier to read on a screen.
Then it was brought to my attention that these four Hanterm fonts, although free of charge, were only licensed for use with Hanterm. Thus, they cannot be used as part of Unifont or any other font.
Because johabg16 could not be licensed under the GPL (even though it is a free font), I then created a new set of Hangul syllables from scratch. This took several years to complete, and is described in the above-mentioned page Generating Hangul Syllables.
The following description is based upon Jungshik Shin's Perl script to convert Johab fonts into the Unicode Hangul Syllables range (U+AC00..U+D7A3) in the GNU Unifont .hex format. Index arrays within the script are described in context. Arrays from the Perl script appear in boldface for easy identification.
The Johab fonts that contain Hangul store multiple versions of each Leading, Middle, or Final glyph so an entire syllable block will look its best. The Hangul Johab Encoded fonts contain 529 code points arranged as follows:
- 0: Filler (blank)
- 1–310: Leading Glyphs (31 basic forms with 10 variations each; 310 total)
- 311–404: Middle Glyphs (29 basic forms with 3 or 4 variations each; 94 total)
- 405–528: Final Glyphs (31 basic forms with 4 variations each; 124 total)
The first 19 Leading letters, each with 10 variations, comprise the basic modern set of initial Hangul consonants. These are followed by 12 archaic letters, each with 10 variations. The variation numbers based upon the middle letter that follows are stored in @lconMap1 and @lconMap2 arrays. In general, the variations are as follows (letters in red are not part of the standard modern set and are my best guesses):
- Initial Consonant Forms for use with No Final Consonant
- Middle letter is on right side only (A, AE, YA, YAE, EO, E, YEO, YE)
- Middle letter is underneath, with lower horizontal stroke (O, YO, I, araea)
- Middle letter is underneath, with upper horizontal stroke (U, YU)
- Middle letter is combination of 0 and 1 (WA, WAE, OE, YI, YO-YA, YO-YAE, YO-I, araea-i)
- Middle letter is combination of 0 and 2 (WEO, WE, WI, YU-YEO, YU-YE, YU-I)
- Initial Consonant Forms for use with Final Consonant
- Middle letter is on right side only (A, AE, YA, YAE, EO, E, YEO, YE)
- Middle letter is underneath, with lower horizontal stroke (O, YO, I, araea)
- Middle letter is underneath, with upper horizontal stroke (U, YU)
- Middle letter is combination of 0 and 1 (WA, WAE, OE, YI, YO-YA, YO-YAE, YO-I, araea-i)
- Middle letter is combination of 0 and 2 (WEO, WE, WI, YU-YEO, YU-YE, YU-I)
The first 21 Middle letters, each with 3 or 4 variations, comprise the basic modern set of Hangul vowels, semivowels ("y" and "w"), and dipthongs. These are followed by 8 archaic middle letters, each with 3 variations.
Each of the middle letters has an associated @vowType (Vowel Type) value; usually this is 0, but for letters with a horizontal long stroke on the bottom of the middle region (O, WA, WAE, OE, EU, YI), the value is 1.
Of the archaic dipthongs, the @vowType property for YU-YEO, YU-YE, and YU-I should therefore be set to 0, and for YO-YA, YO-YAE, YO-I, araea, and araea-i should be set to 1.
The three or four variations for each vowel are used as follows:
- Middle Letter Forms for use with No Final Consonant
- Filler only
- All other Middle letters except "N"
- Middle Letter Forms for use with Final Consonant
- Middle letter is "N", or the condition for case 3 below is not met
- Middle letter has a long horizontal stroke on the bottom underneath the Leading letter, and the Leading letter is not "G" or "K"
The first 27 Final letters, each with 4 variations, comprise the basic modern set of Hangul Final letters. These are followed by 4 archaic Final letters, each with 4 variations.
The Final letters each have an entry in the @tconType array. These entries indicate which variation of a Middle Letter should be selected. The values are: 0 for the filler, 2 for "N", and 1 for all others. None of the archaic Final letters have an open top like "N", so they should also all have a value of 1.
If the variation of a vowel indicated in @tconType is 1, then a further check is made. In such a case, if the leading consonant is "G" or "K" (and therefore has an open bottom), the first or third form is used (depending on whether or not there is a final consonant, respectively). Otherwise, the second or fourth form is used (depending on whether or not there is a final consonant, respectively).
This is why some Middle letters have four types: if @tconType is 1 for the Middle letter, then the corresponding vowel is shaped similarly to "O" and must have four forms. Then the first and third forms will be used when the vowel appears after an initial "G" or "K", so that the short vertical stem(s) on the long horizontal stroke reach higher and possibly overlap with the bottom of the "G" or "K".
Of the four variations, the appearance of first three in the font files appear identical, so selection of a value of 0, 1, or 2 might not be critical. The left edge of the fourth variation is always shifted to the left at least one pixel (usually two pixels), and the right edge is shifted left from zero to two pixels. The values in red below are my guesses of what variations suit the archaic Terminal (final) consonants. If anyone knows better, please let me know.
- Terminal (Final) Consonant Forms
- Middle letter is on right and narrow, shaped like "A" (A, YA, WA, WAE, OE, YI)
- Middle letter is on right and narrow, shaped like "EO", possibly with a long horizontal stroke (EO, YEO, WEO, WE, WI, I)
- Middle letter is on right and wider, shaped like "E" (AE, YAE, E, YE)
- Middle letter has a long horizontal stroke but no long vertical stroke (O, YO, U, YU, EU)
Unicode Hangul Jamo (U+1100..U+11FF)
The Hangul Jamo Unicode Block contains the possible Leading, Middle, and Final parts of a Hangul block in the following ranges:
- U+1100..U+1112: the basic 18 lnitial consonants plus 1 silent consonant (NG)
- U+1113..U+115E: other initial complex and ancient consonants
- U+115F: choseong (initial) filler
- U+1160: jungseong (middle) filler
- U+1161..U+1175: the basic 21 vowels and dipthongs
- U+1176..U+11A7: other and ancient vowel combinations
- U+11A8..U+11C2: the basic 27 Final consonant combinations
- U+11C3..U+11FF: other Final consonant combinations
Unicode Hangul Syllables (U+AC00..U+D7A3)
The Unicode Hangul Syllables block of 11,172 complex glyphs forms the main part of the Hangul code points in Unicode. This block contains every possible combination of the 18 Leading Consonants or 1 Leading Filler (19 possible initial letters), 21 Middle Vowels and Dipthongs, and 27 Final Consonants (or no Final Consonant).
If a Korean word is pronounced as beginning with a vowel, it is written with choseong ieung (NG) in the Leading Consonant position. The "NG" is silent in the initial position of a Hangul syllable.
There are 19 × 21 = 399 possible combinations of Leading Consonants and Middle Vowels and Dipthongs. Each of these can appear as a pair, or with any of the 27 Final Consonant combinations, for a total of 28 possible combinations each. Thus there are 399 × 28 = 11,172 glyphs total possible from 19 Leading glyphs, 21 Middle glyphs, and 27 Final glyphs (or no Final glyph).
For a thorough discussion of the creation of the Hangul Syllables block in Unifont versions after 5.1, see Generating Hangul Syllables.