Download the Utilities
You can download the latest utilities package here, updated on 7 September 2008:
- unifoundry-5.1.20080907.tar.gz Gzipped Unix Tarball
The version number is taken from version 5.1 of the Unicode Standard and the date on which the package was assembled.
The above version has modified combining diacritical marks in the TrueType version for spacing conforming to the Unicode Standard. The Makefiles were also modified for compatibility with FreeBSD Unix. Finally, I renamed the "combining.txt" file to "combining.dat".
I Wonder as I Wander
The GNU Unifont covers the Unicode Basic Multilingual Plane (BMP). When I first looked at it in late 2007, the GNU Unifont was missing roughly 17,500 glyphs from the Unicode 5.0 Basic Multilingual Plane (BMP). Since then, the addition of Qianqian Fang's Unibit CJK glyphs and my additions have provided complete coverage of the Unicode 5.1 BMP, including the over 1,000 glyphs added in Unicode 5.1 to the BMP. See the Unifont Glyphs page for more details on the font's latest status.
I was on travel, and didn't have access to a Linux system but wanted to work with the GNU Unifont. I did have the cygwin package installed on my Windows laptop but alas, did not have Perl installed to use the Unifont creator's Perl scripts. What to do?
They say that when all you have is a hammer, everything looks like a nail. Well, all I had was a C compiler!
Necessity is the mother of invention, so I decided to write a C version of the Perl script that converts .hex files into ASCII and back for easy font editing in a text editor. I do not include that software for download here, because what I did next was far better.
I completed that software in short order, but then realized that if the glyphs were represented as bitmaps, they could be edited in a graphics editor. As a result, characters could be edited and viewed with the same aspect ratio that they'd have on final display. What better way to edit them than at the correct aspect ratio from the beginning?
The resulting software displays a full 32-bit Unicode value (leading zeroes and all) to be displayed on a page, even though the GNU Unifont only supports Plane 0, the BMP, and even though Unicode itself only specifies values up to U+10FFFF. The upper two byte values are printed in the upper left-hand corner, as "U+nnnn". Characters on a page are arranged in a 16 by 16 grid (256 characters per page). Notches in the grid denote vertical and horizontal centers, and vertical and horizontal boundaries for 8 pixel wide and 16 pixel wide characters.
For example, the distance from the left-most notch in a grid square to the right-most notch in a grid square is 16 pixels, and the distance from the top-most notch in a grid square to the bottom-most notch in a grid square is also 16 pixels. The grid lines themselves are on a 32 by 32 pixel grid, providing some whitespace for clarity. Any graphics editor providing 400 times or 800 times magnification should suffice for easy editing of these bitmaps. I made about half of my pixel edits at 400x magnification, and the other half at 800x magnification.
I chose the single-pixel wide grid border format to be compatible with Font Lab's commercial BitFonter program. Yes, that's right, I use commercial font software sometimes. Bitfonter will read a table of glyphs automatically (with a little hinting on your part). BitFonter will also read a .bdf file, which is generated by one of the Unifont creator's Perl scripts. However, as previously mentioned, I didn't happen to have Perl installed under cygwin on my laptop.
unihex2bmp output can be found at the end
of the Unicode Tutorial page on this website.
I began with the Wireless Bitmap file format (.wbmp) because it was the simplest graphics format I could find: a rectangular monochrome bitmap. It doesn't get any simpler than that. Once that was working, I added header processing for the Microsoft Windows Bitmap (.bmp) format. That allows editing in a wider range of graphics editors.
In a Wireless Bitmap file, a white pixel is always represented by a "1" bit, and a black pixel is always represented by a "0" bit. That is also the default Windows Bitmap encoding produced by Microsoft Paint (which I used along with my programs under cygwin), so that is the encoding that I used for pixels: white is a "1" bit, and black is a "0" bit.
Some sample results appear at the bottom of the Unicode Tutorial web page on this site.
After adding that functionality, I decided to add one more option: allowing the matrix to be transposed ("flipped", going from top to bottom, left to right rather than from left to right, top to bottom) to match the glyph ordering in the Unicode standard itself. (Every other system I've seen, including the commercial font editing tools from Font Lab, arrange characters the other way, from left to right, top to bottom). I realized that would allow easy comparison with the Unicode code charts to facilitate adding new glyphs.
The two main utilities,
convert GNU Unifont .hex files to and from
Windows Bitmap (.bmp) and Wireless Bitmap (.wbmp) files.
These two utilities use the Windows Bitmap format to
allow glyph editing with the Microsoft Paint accessory bundled with
Windows. I was on the road with my laptop when I wrote them,
and wanted software that would let me easily edit the GNU Unifont on
my Windows laptop.
The utilities were written as a quick hack, without tons of robust error checking or other bullet-proofing. This software is written in C, and should compile and run on just about anything that has a C compiler.
Various programs in the Unifoundry.com GNU Unifont Utility Package are Copyright © 1998–2008 Roman Czyborra, Paul Hardy, and Luis González Miranda. The enclosed fonts are Copyright © 1998–2008 Roman Czyborra, Paul Hardy, Qianqian Fang and the Wen Quan Yi Volunteers, Rich Felker, et al. For more details on the history, see the README file in the tarball.
The Unifoundry.com GNU Unifont Utility Package is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.
Fonts in the package based on the Wen Quan Yi fonts are distributed under the terms of the GNU GPL version 2, with the exception that embedding the fonts in a document does not in itself bind that document to the terms of the GNU GPL.
The Unifoundry.com GNU Unifont Utility Package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with the Unifoundry.com GNU Unifont Utility Package. If not, see http://www.gnu.org/licenses/.
I (Paul Hardy) wrote four main utility programs in this package:
unihex2bmp— converts one 256 code point page of a .hex Unifont file into a bitmapped 16 by 16 grid.
unibmp2hex— converts one of the above bitmaps back into .hex format.
unipagecount— counts the number of code points that have representation in a .hex Unifont file.
unidup— searches for duplicate code point entries in a sorted a .hex Unifont file.
programs accept the following options:
Specify the input file. The default is
stdin. For example, "
-iunifont.hex" specifies the input file as "unifont.hex".
Specify the output file. The default is
stdout. For example, "
-omyoutput.bmp" specifies the output file as "output.bmp". Warning: there's no check to see if an output file exists — these utilities will clobber an existing file for output.
Specify a "page", or block of 256 code points, to convert.
"Page" is my term, because that's what prints on a bitmap
graphics page; it isn't a standard Unicode term. For example,
-p83specifies the range U+8300 through U+83FF. If you don't specify a page with
unibmp2hex, it figures out the page by reading the row and column labels in the bitmap file. The default page is 0.
unihex2bmp accepts the following options:
- Create a Wireless Bitmap graphics file instead of the default Windows Bitmap file.
- "Flip" (transpose) the grid to match the structure of the Unicode standard. This prints code points top to bottom, then left to right. The default order is left to right, then top to bottom.
unibmp2hex will figure out if a bitmap
is flipped (transposed) or not, and whether it is in Wireless Bitmap
or Microsoft Bitmap format. It reads the last column (or top row if
flipped) of numbers to the
left of the grid as the format for all hex digits, then compares
the other row and column headers to determine the "page", unless the
page is specified with the
-p command line option.
unibmp2hex outputs characters in the BMP in standard
Unifont .hex format. If a character is above the BMP, it outputs hex codes
preceded by an eight digit hexadecimal number rather than a four digit
hexadecimal number, with everything else being the same.
unibmp2hex only understands one height, 16 pixels;
it only understands two widths, 8 or 16 pixels. When reading the center
of each 32 by 32 pixel grid, it detects whether or not the second half
of the center 16 by 16 pixel grid is blank. If it is, then it outputs
the .hex character as a 16 row by 8 column hex code. If there is even
one black pixel in the second half of the 16 by 16 grid, it outputs the
.hex character as a 16 row by 16 column hex code.
Caveat Emptor. These programs were written very, very quickly over a few evenings as a hack. It wouldn't surprise me if they have bugs, but they seem to work perfectly. In addition, these programs don't contain much in the way of error checking. If you do feed these programs bogus values or anything similar, expect the unexpected.
GNU Unifont Status
I wanted to determine how many characters were still needed to complete
the GNU Unifont.
First, I assembled as complete a glyph collection as I could. I began
with Roman Czyborra's original
unifont.hex file, then
applied all of the updates on his website. Then I applied all of the updates
from the Debian distribution. Finally, I added the Tibetan glyphs that
Rich Felker posted to debian.org in 2006. (The "unifont" package was orphaned
in Debian in 2006, and Rich's contribution hadn't been added.)
One remaining difficulty in this calculation is that the unassigned characters, Specials, Noncharacters, etc. weren't noted in any special way. So I went through and added glyphs for them that look like gray boxes, based upon the Unicode 5.0 Standard. Those filler glyphs are available for download in the Unicode Glyphs section of this website. I then inserted these "blank" glyphs into my local copy of the unifont.hex file.
Finally, to see how many characters there were in each 256 character
area, I wrote the
unipagecount program to read the
new .hex file.
unipagecount utility prints the high-order
nybble as row headers (in the left-most column) and the low-order nybble
as column headers (in the first row). Values range from 0 for a 256
character area with no entries to 100 (hex) for a 256 character area with
all entries present.
Here are the results on the 2008-01-28 unifont.hex file:
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 100 100 100 100 100 100 100 100 100 100 5F 68 5A 62 100 100 1 B2 53 100 100 100 100 100 3D 65 2B 100 87 100 100 100 100 2 100 DF F2 93 EB EE BC BB 100 0 0 E1 100 100 73 100 3 EC B5 53 64 100 100 100 100 100 100 100 100 100 100 100 100 4 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 5 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 6 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 7 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 8 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 9 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 A 1 0 0 0 3C 100 100 100 9C 100 100 100 100 100 100 100 B 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 C 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 D 100 100 100 100 100 100 100 100 0 0 0 0 0 0 0 0 E 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 F 100 100 100 100 100 100 100 100 100 100 89 FF 30 3E E1 C7
You can see that the first ten blocks of 256 characters (U+0000 through U+09FF, in the upper left-hand corner) are complete: all 10016, or 25610 characters have glyphs.
Some 256 character blocks don't have any assignments. The range U+D800 through U+DFFF is reserved for surrogate pairs. The range U+E000 through U+F8FF is reserved for private use.
The Unifont Glyphs page shows a color-coded view of font coverage.
This was made with the
unipagecount program with
-l option, to produce HTML output with links.
Any box that is light green is 100 percent complete. Any box that
is red or near-red has no or hardly any glyphs complete. Yellow
and orange are intermediate, with orange cells having less coverage
than yellow cells.
The combining character dashed circle in the existing
unifont.hex file has this pattern:
You can see this, for example, in the Combining Diacritical Marks block
at U+0300 through U+036F. Not all combining circles follow this
pattern precisely. For that reason, I wrote
uniunmask for version 1.02. This program reads
a second .hex file,
masks.hex, and XORs it with the main .hex file for
any matching code points. This allows combining circle marks to appear
in a master file, but be easily removed for display, for example on
GNU Unifont and True TypeLuis Alejandro González Miranda has created a utility to convert the GNU Unifont into a True Type font by using fontforge. His website is http://www.lgm.cl/trabajos/unifont/index.en.html.
Roman Czyborra's GNU Unifont Utilities
Since Roman's website is currently down, here's a gzipped tar file of his Perl scripts (bdfimplode.pl, hex2bdf.pl, hexdraw.pl, and hexmerge.pl). I'll post more information later.
Auxes Armes, Netizens!
A call to arms, or "Where do I sign up?"
Roman Czyborra had asked that additions be emailed in .hex format to (anti-spam version of address) unifont at his domain czyborra.com. Current news on the GNU Unifont was available at http://czyborra.com/unifont/.
However, his website is currently down. You can send any updates to unifoundry at this domain, unifoundry.com, and I'll add them to my master copy for the next release. Thanks!
If you have any questions, please email unifoundry at this domain name (not spelled out because of spammers).