Manuscript Comparator and the Open Scriptures Platform
(Originally posted on my personal blog.)
For the past several weeks all of my free time has gone into building the first application for Open Scriptures. For many months I had been working on designing the database and in December I finally got it to a point where it could store all of the necessary information so that application development could begin. The first application developed is the Manuscript Comparator. This application demonstrates what is possible when the semantic units of individual texts are linked together—when the interrelationships between semantic units are stored in a database and can be queried.
The database is constructed as follows: various manuscripts available on the Web today are each imported into the database individually, storing each manuscript’s word (token) separately with a unique identifier for each. After all of the individual manuscripts have been imported, they are then all merged together into a unified manuscript. The merging algorithm normalizes the text for comparison by removing all casing, diacritics, and punctuation; the unified manuscript stored in the database is composed of these normalized words. So the result of the manuscript merge is a unified manuscript which consists of every possible variant attested to by the contributing manuscripts; furthermore, all of the tokens in an individual manuscript are linked back to their corresponding words in the unified manuscript. Thus every manuscript is linked to every other manuscript by means of their links to a common point, the unified manuscript.
With the database of interlinked manuscripts constructed, the Manuscript Comparator is able to obtain the differences among manuscripts by querying the database for the requested manuscripts and joining them to each other and the unified manuscript. The results are presented in either a parallel (side-by-side) or unified view, with words highlighted according to whether they are “inserted” or “deleted”. (Read the introduction for more information regarding the user interface.) The unified view will serve as the foundation for the upcoming tool which will allow contributors to link the semantic units between manuscripts and translations (see an old prototype), and thus the links between translations via their common links to the unified manuscript. With such semantic links between translations in place, a Translation Comparator application will be possible which compares not the forms of the words in the translations (as is easily done today) but rather one which actually compares the translations based on their manuscript sources. For example, comparing the English King James version with the Spanish Reina Valera version would result in very few differences (if any) since they both rely on the Textus Receptus. Additionally, with the semantic links in place, it will also be able to compute the degree to which any translation relies on one manuscript over another.
The applications possible with this data are really exciting. Open Scriptures aims not only to be a “comprehensive open-source Web repository for integrated scriptural data,” but also “a general application framework for building internationalized social applications of scripture” which present data “in a translation-neutral and internationalized manner so as to be accessible to the community no matter what language they speak or version they prefer.” Inspiration for this framework comes from the Facebook Platform which provides an API enabling web developers to create applications powered by Facebook’s social network data. What if we had a similar platform and framework which enabled web developers to easily build applications which are powered by interlinked scriptural data? What if these applications were hosted on the Cloud as with Google App Engine? These ideas about a scriptural web application platform have really been exciting me, but they haven’t started cooking yet. The ingredients are only just now being gathered… please join me!
One Comment
This is really cool; keep up the great work!
Subscribe to the comments feed.