Beta Code to/from UTF-8 Unicode Conversion Package

GNU Unifont
Unicode Tutorial
Hangul Fonts
Japanese Fonts
Retro Fonts
Fontforge Poll
Checking .sigs

The original Unibetacode package contained standalone programs to convert between the Beta Code encoding method for polytonic Greek and UTF-8 Unicode. Version 2.0 added the libunibetacode library, with functions for converting Greek, Coptic, and Hebrew Beta Code strings to and from UTF-8 strings.

The standalone programs are designed to accept textual input from the Beta Code specification as implemented by the Thesaurus Linguae Graecae (TLG) Project at the University of California, Irvine, and also by the Perseus Project of Tufts University. The programs only handle character encodings, not the formatting codes (superscripts, font size changes, etc.) of the full TLG Beta Code specification.

The focus of Beta Code is an ASCII encoding of classical Greek, and thus the default encoding of the standalone programs is ASCII Beta Code for conversion to Greek UTF-8 Unicode. This package can be of use to those wishing to type polytonic Greek who are already good typists of ASCII characters, as well as to those wishing to convert documents in the TLG, Perseus Project, or similar corpus from Beta Code to Unicode.

The three standalone programs in this package are:

The libunibetacode library contains these top-level functions:

To use this library once installed on your system, simply compile a C program with the -lunibetacode flag. There are no additional header files to include. For example, the test program ublibcheck.c located in the test directory in the source distribution can be compiled as follows:

cc ublibcheck.c -o ublibcheck -lunibetacode

The program ublibcheck.c converts Greek, Coptic, and Hebrew Beta Code strings into UTF-8 Unicode, and then converts them back to Beta Code, verifying the round-trip conversion. In addition to being a test program, it thus provides a practical example of how to use the library functions.

The source package contains Beta Code examples in the "examples" directory, which are also used to test the standalone programs once installed on a system. Three short Greek Beta Code examples appear below.

Example — Interactive Terminal Input

The commands can be run interactively at a terminal. In the example below, the first line after invoking beta2uni at a terminal shell prompt ("") is the user's typed Beta Code input and the second line is the generated polytonic Greek, which is output directly to the terminal. End the input by typing Control-D on a line by itself, followed by the Enter key. The result can be copied and pasted into a document.

     $ beta2uni
     *(o bi/os braxu/s h( de\ te/xnh makrh/
     Ὁ βίος βραχύς ἡ δὲ τέχνη μακρή

      Life [is] short and art long. —Hippocrates

The input ASCII Beta Code letters are lowercase in this example, following the convention of the Perseus Project. Had uppercase letters been used instead (as is the convention of the TLG Project), the Greek output would have been identical.

An asterisk ('*') denotes an uppercase Greek letter; if present, it always appears first, before the letter and any associated breathing marks and accents. Breathing marks and then accents precede an associated uppercase letter but follow after an associated lowercase letter, reflecting the pre-Unicode typesetting conventions for printed Greek. An iota subscript ('|') associated with a long vowel always appears last after the vowel, as seen in the next example, whether the associated vowel is uppercase or lowercase.

Example — Genesis 1:1

A '&' switches to Latin mode and a '$' switches back to the default Greek mode.

Beta Code Encoding Input to beta2uni:

     &Koine Greek (Septuagint):$
     *)en a)rxh=| e)poi/hsen o( *qeo\s to\n ou)rano\n kai\ th\n gh=n.

UTF-8 Unicode Output from beta2uni:

     Koine Greek (Septuagint):
     Ἐν ἀρχῇ ἐποίησεν ὁ Θεὸς τὸν οὐρανὸν καὶ τὴν γῆν.

Note the context-dependent conversion of the ASCII letter 's' in the Beta Code input to small medial (middle) sigma or small final sigma in the UTF-8 Unicode output.

Example — Byzantine Musical Symbols

The TLG Beta Code specification includes special numeric codes in the range "#2000" through "#2245" that map to Unicode Supplementary Multilingual Plane Byzantine Musical Symbols, as just one example of its special numeric codes. The unibetaprep program converts those codes to the Unicode extension to Beta Code that is unique to this Unibetacode package. However, Unicode has become the standard for exchange of textual information. There is no corresponding program to convert a file back to using the special numeric codes of the TLG Beta Code specification, and there are no plans to create such a program.

TLG-specific Special Numeric Code Input to unibetaprep:

     #2070 &(U+1D046)$ *)/ison
     #2071 &(U+1D047)$ *)oli/gon
     #2078 &(U+1D04E)$ *kenth/mata
     #2073 &(U+1D049)$ *petasth/
     #2081 &(U+1D051)$ *)apo/strofos

Unibetacode's Unicode Extension to Beta Code; Output from unibetaprep, Input to beta2uni:

     {\u1D046} &(U+1D046)$ *)/ison
     {\u1D047} &(U+1D047)$ *)oli/gon
     {\u1D04E} &(U+1D04E)$ *kenth/mata
     {\u1D049} &(U+1D049)$ *petasth/
     {\u1D051} &(U+1D051)$ *)apo/strofos

Beta Code Conversion to UTF-8 Unicode; Output from beta2uni:

     𝁆 (U+1D046) Ἴσον
     𝁇 (U+1D047) Ὀλίγον
     𝁎 (U+1D04E) Κεντήματα
     𝁉 (U+1D049) Πεταστή
     𝁑 (U+1D051) Ἀπόστροφος

Reference Documents

Unix-style man (manual) pages are listed below. The unibetacode man page describes the Beta Code format that these utilities implement, so read it first. Examples in the "examples" directory in the source code demonstrate practical use.

More information is available in the following documents:

The source tarball and GnuPG signature file can be downloaded at these links:

Installing Unibetacode

Note: Unibetacode version 2.0 used the libtool package to build static and shared libraries. Version 2.0.1 no longer uses libtool, and only builds a static library: libunibetacode.a. If you want to install the shared library on your computer, you can install the earlier version 2.0 file.

Compiling the package requires a C compiler and the Unix make utility. To compile and install the Unibetacode package on a system with a Unix-style command line interface (GNU/Linux, BSD, Mac OS X, Cygwin, etc.), type these commands in a terminal window:

     make check
     make install
     make clean

The "make check" command will run five tests using sample text files that are in the "examples" directory, and then runs one test program that exercises the functions in the libunibetacode library.

The "make install" command might need to be run as "sudo make install" on your system.

By default, the standalone programs (unibetaprep, beta2uni, and uni2beta) will be installed in "/usr/local/bin", the libunibetacode library will be installed in "/usr/local/lib", and the man (manual) pages will be installed in the appropriate subdirectories under "/usr/local/share/man".

If the target system supports it, the "make install" command will install both the static and the shared versions of the libunibetacode library.


The software on this site, unless otherwise noted, is released under the terms of the GNU General Public License (GNU GPL) version 2.0, or (at your option) a later version.

Valid HTML 4.01 Transitional Valid CSS! Best Viewed with Any Browser