The original Unibetacode package contained standalone
programs to convert between the Beta Code encoding method for
polytonic Greek and UTF-8 Unicode.
Version 2.0 added the libunibetacode
library,
with functions for converting Greek, Coptic, and Hebrew Beta Code
strings to and from UTF-8 strings.
The standalone programs are designed to accept textual input from the Beta Code specification as implemented by the Thesaurus Linguae Graecae (TLG) Project at the University of California, Irvine, and also by the Perseus Project of Tufts University. The programs only handle character encodings, not the formatting codes (superscripts, font size changes, etc.) of the full TLG Beta Code specification.
The focus of Beta Code is an ASCII encoding of classical Greek, and thus the default encoding of the standalone programs is ASCII Beta Code for conversion to Greek UTF-8 Unicode. This package can be of use to those wishing to type polytonic Greek who are already good typists of ASCII characters, as well as to those wishing to convert documents in the TLG, Perseus Project, or similar corpus from Beta Code to Unicode.
The three standalone programs in this package are:
-
unibetaprep
— Performs preparatory transformation of special numbered code sequences in the full TLG Beta Code specification (for example the Unicode Byzantine Musical Symbols range), converting them to ASCII-specified Unicode code points. These are written as the sequence "\u" followed by the hexadecimal digits corresponding to the Unicode code point. Such Unicode code points can be enclosed in curly brackets ("{…}"). This full Unicode capability is an extension to Beta Code that favors Unicode over the original TLG encoding. -
beta2uni
— Converts a Beta Code-encoded file to UTF-8. This program can handle the ordinary ASCII Beta Code encoding of polytonic Greek characters. It also handles Bohairic Coptic letters with the jinma (grave) accent, and Hebrew letters. Unicode character support extends this capability of the Beta Code specification to provide complete Unicode coverage with "\u…" sequences. -
uni2beta
— Converts a UTF-8 document that contains polytonic Greek, Bohairic Coptic, and Hebrew letters to Beta Code.
The libunibetacode
library contains these
top-level functions:
- Conversion from Beta Code to UTF-8:
-
ub_beta2greek
converts Greek Beta Code input strings to UTF-8 output strings. -
ub_beta2coptic
converts Coptic Beta Code input strings to UTF-8 output strings. -
ub_beta2hebrew
converts Hebrew Beta Code input strings to UTF-8 output strings. -
ub_codept2utf8
converts a single Unicode code point input to a UTF-8 output string.
-
- Conversion from UTF-8 to Beta Code:
-
ub_greek2beta
converts Greek UTF-8 input strings to Greek Beta Code output strings. -
ub_coptic2beta
converts Coptic UTF-8 input strings to Coptic Beta Code output strings. -
ub_hebrew2beta
converts Hebrew UTF-8 input strings to Hebrew Beta Code output strings. -
ub_utf82codept
converts a single UTF-8 code point input to a Unicode code point.
-
To use this library once installed on your system, simply compile
a C program with the -lunibetacode
flag. There are
no additional header files to include. For example, the test
program ublibcheck.c
located in the test
directory in the source distribution can be compiled as follows:
cc ublibcheck.c -o ublibcheck -lunibetacode
The program ublibcheck.c
converts Greek, Coptic,
and Hebrew Beta Code strings into UTF-8 Unicode, and then
converts them back to Beta Code, verifying the round-trip
conversion. In addition to being a test program, it thus
provides a practical example of how to use the library functions.
The source package contains Beta Code examples in the
"examples
" directory, which are also used
to test the standalone programs once installed on a system.
As a brief introduction to Beta Code, three short Greek Beta Code examples appear below. Examples in Hebrew and Coptic then follow.
Example — Interactive Terminal Input
The commands can be run interactively at a terminal. In the example
below, the first line after invoking beta2uni
at a terminal
shell prompt ("$
") is the user's typed Beta Code
input and the second line is the generated polytonic Greek, which is
output directly to the terminal after pressing the Enter or Return key.
End the input by typing Control-D on a line by itself, followed by
the Enter or Return key. The result can be copied and pasted into
a document:
$ beta2uni *gnw=qi seauto/n Γνῶθι σεαυτόν ^D $
Know thyself —First Delphic maxim
The input ASCII Beta Code letters are lowercase in this example, following the convention of the Perseus Project. Had uppercase letters been used instead (as is the convention of the TLG Project), the Greek output would have been identical.
An asterisk ('*') denotes an uppercase Greek letter; if present, it always appears first, before the letter and any associated breathing mark and accents.
Breathing marks and then accents precede an associated uppercase letter but follow after an associated lowercase letter, reflecting the pre-Unicode typesetting conventions for printed Greek. Here is a longer example, showing more polytonic encoding combinations:
$ beta2uni *(o bi/os braxu/s h( de\ te/xnh makrh/ Ὁ βίος βραχύς ἡ δὲ τέχνη μακρή ^D $
Life [is] short and art long. —Hippocrates
Example — Genesis 1:1
An ampersand ('&') switches to Latin mode and a dollar sign ('$') switches back to the default Greek mode. An iota subscript associated with a long vowel, entered as a verical bar ('|'), always appears last after the vowel and any associated accents. This rule holds whether the vowel is uppercase or lowercase. The next example demonstrates use of this symbol.
Beta Code Encoding Input to
beta2uni
:
&Koine Greek (Septuagint):$ *)en a)rxh=| e)poi/hsen o( *qeo\s to\n ou)rano\n kai\ th\n gh=n.
UTF-8 Unicode Output from beta2uni
:
Koine Greek (Septuagint): Ἐν ἀρχῇ ἐποίησεν ὁ Θεὸς τὸν οὐρανὸν καὶ τὴν γῆν.
Note the context-dependent conversion of the ASCII letter 's' in the Beta Code input to small medial (middle) sigma or small final sigma in the UTF-8 Unicode output.
Example — Byzantine Musical Symbols
The TLG Beta Code specification includes special numeric codes in the
range "#2000" through "#2245" that map to Unicode Supplementary
Multilingual Plane Byzantine Musical Symbols, as just one example of
its special numeric codes. The unibetaprep
program
converts those codes to the Unicode extension to Beta Code that is
unique to this Unibetacode package. However, Unicode has become
the standard for exchange of textual information.
There is no corresponding program to convert a file back to using
the special numeric codes of the TLG Beta Code specification, and
there are no plans to create such a program.
TLG-specific Special Numeric Code Input to unibetaprep
:
#2070 &(U+1D046)$ *)/ison #2071 &(U+1D047)$ *)oli/gon #2078 &(U+1D04E)$ *kenth/mata #2073 &(U+1D049)$ *petasth/ #2081 &(U+1D051)$ *)apo/strofos
Unibetacode's Unicode Extension to Beta Code; Output from unibetaprep
,
Input to beta2uni
:
{\u1D046} &(U+1D046)$ *)/ison {\u1D047} &(U+1D047)$ *)oli/gon {\u1D04E} &(U+1D04E)$ *kenth/mata {\u1D049} &(U+1D049)$ *petasth/ {\u1D051} &(U+1D051)$ *)apo/strofos
Beta Code Conversion to UTF-8 Unicode; Output from beta2uni
:
𝁆 (U+1D046) Ἴσον 𝁇 (U+1D047) Ὀλίγον 𝁎 (U+1D04E) Κεντήματα 𝁉 (U+1D049) Πεταστή 𝁑 (U+1D051) Ἀπόστροφος
Example — Genesis 1:1 — Multilingual
The following example shows Genesis 1:1 in several languages.
The unencoded and encoded text below is contained in the files
"examples/genesis.beta
" and
"examples/genesis.utf8
" in the software package,
respectively.
Beta Code Encoding Input to beta2uni
:
&Genesis 1:1$ &Koine Greek (Septuagint):$ *)en a)rxh=| e)poi/hsen o( *qeo\s to\n ou)rano\n kai\ th\n gh=n. &Hebrew, Letters Only (Standard Beta Code):$ &300brAsyt brA Alhym2 At hsm1ym2 vAt hArT2$ &Hebrew, Full Orthography (Unicode Extension to Beta Code):$ &300b{\u05B0\u05BC}r{\u05B5}As{\u05B4\u05C1\u0596}yt b{\u05B8\u05BC}r{\u05B8\u05B3}A A{\u05B1}l{\u05B9}h{\u05B4\u0591}ym2$ &300A{\u05B5\u05A5}t h{\u05B7}s{\u05B8\u05BC\u05C1}m1{\u05B7\u0596}y{\u05B4}m2 v{\u05B0}A{\u05B5\u05A5}t h{\u05B8}A{\u05B8\u05BD}r{\u05B6}T2{\u05C3}$ &Bohairic Coptic:$ &100*kEN OUARXH A\ F\NOUt QAMIO\ N\TFE NEM PHAOI$
UTF-8 Unicode Output from
beta2uni
:
Genesis 1:1 Koine Greek (Septuagint): Ἐν ἀρχῇ ἐποίησεν ὁ Θεὸς τὸν οὐρανὸν καὶ τὴν γῆν. Hebrew, Letters Only (Standard Beta Code): בראשית ברא אלהים את השמים ואת הארץ Hebrew, Full Orthography (Unicode Extension to Beta Code): בְּרֵאשִׁ֖ית בָּרֳָא אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃ Bohairic Coptic: Ϧⲉⲛ ⲟⲩⲁⲣⲭⲏ ⲁ̀ ⲫ̀ⲛⲟⲩϯ ⲑⲁⲙⲓⲟ̀ ⲛ̀ⲧⲫⲉ ⲛⲉⲙ ⲡⲏⲁⲟⲓ
Reference Documents
Unix-style man (manual) pages are listed below.
The unibetacode
man page describes
the Beta Code format that these utilities implement, so
read it first. Examples in the "examples
"
directory in the source code demonstrate practical use.
More information is available in the following documents:
-
unibetacode
— Reference for Beta Code that this package implements; read this first. -
unibetaprep
— Prepare a file containing special TLG numeric sequences for special text to Beta Code with an extension for Unicode code points. -
beta2uni
— Beta Code to UTF-8 Unicode conversion program. -
uni2beta
— UTF-8 Unicode to Beta Code conversion program. -
libunibetacode
— Library of functions for converting between Beta Code and UTF-8 Unicode.
Unibetacode Download
The source tarball and GnuPG signature file can be downloaded at these links:
Unibetacode Installation
Compiling the package requires a C compiler and the Unix
make
utility. To compile and install the
Unibetacode package on a system with a Unix-style command
line interface (GNU/Linux, BSD, macOS, Cygwin, etc.),
type these commands in a terminal window:
./configure make make check make install make clean
The "make check
" command will run six tests
using sample text files that are in the "examples
"
directory, and then will run one test program that exercises the
functions in the libunibetacode
library.
The "make install
" command might need to be run as
"sudo make install
" on your system.
By default,
the standalone programs (unibetaprep
,
beta2uni
, and uni2beta
) will be
installed in "/usr/local/bin
",
the libunibetacode
library will be
installed in "/usr/local/lib
",
and the man (Unix manual) pages will be installed in
the appropriate subdirectories under
"/usr/local/share/man
".
The latest version of Unibetacode uses the GNU libtool
package to build static and shared libraries, first introduced in
Unibetacode 2.0. Both library files are built together during the
"make" operation.
If the target system supports it, the "make install
"
command will install both the static and the shared versions of
the libunibetacode
library.
Technical Note:
The source code for textual pattern matching is written using the
flex
lexical analyzer generator. This source code is
in the *.l
files in the src/progsrc
directory.
The initial build process converts these files to C sources in the
same directory for the distribution tarball, so that the package
will build with just a C compiler and the make
utility
whether or not flex
(or a similar Unix/POSIX lex program)
is installed on the build system.
License
The software on this site, unless otherwise noted, is released under the terms of the GNU General Public License (GNU GPL) version 2.0, or (at your option) a later version.