Weston Ruter | Open Scriptures

Redeeming the Ill-fated Re:Greek Project: a Call for Participation

Update 3: Looking for a website that provides the same core functionality of Zhubert.com? Check out John Dyer’s excellent Reader’s Version of Greek and Hebrew Bible. It’s free!

Update 2: See post on the group regarding how to respond to copyright issues.

Update 1: zhubert.com has been shut down due to the discovery that the MorphGNT has infringed on the German Bible Society‘s copyright of the UBS4 edition of the Greek New Testament. Please do not flame zhubert.com or the German Bible Society or anyone else in the comments here. Each is doing what they feel is right given the situation. We are working on a solution to the copyright issue.

Zack Hubert (former Pastor of Technology at Mars Hill Chruch, now Vice President of Church Community Networks at Zondervan after they aquired The City) spoke (MP3) last year at BibleTech about the history of his free NT Greek web app zhubert.com. After years of development he came to a point in life where he had no time to maintain the project, let alone take it to another level. He shared his desire to further the project and his first plan to do so:

(22:32) What can I do to keep this project continuing since I only had 15 minutes a week to go in and either approve a lexicon entry or make a little bug fix. So I came up with a plan. The plan was the Biblios foundation. (I say past-tense for good reason.) So I started off down the road of creating a non-profit organization. And the idea was that it could accept donations and that I could have this self-supporting organization where I’d draw no income but I could use it to contract out developers… you know find other guys that were right out of college or in seminary or what not and give them $15 an hour or some ridiculously small amount to add on the features and functionality that I think really would have taken zhubert to another level… but I’m not a business man, and yeah… it failed. Quite simply, looking back on it I think people just wanted the software for free. […] (24:25) So development of that just took a break. I was kinda disappointed honestly because I really wanted to see this project go, but there seemed like very few people who were getting behind to help me, so I took a mental break from the project.

He then shared his next plan, to transition his privately-developed closed-source project into an open-source community-driven one called the “Resurgence Greek Project” or simply “Re:Greek”. In an interview (via Wayback Machine since original post inaccessible) he shared:

My vision for Re:Greek is that it would be an Open Source project that could capture the imagination of software developers all around the world. I want to see dozens of developers and designers submitting improvements, new features, localizing Re:Greek into their local language, and innovating on this shared platform. I’ll release more in the near future about how this will work, but it will be a model similar to Linux or Ruby on Rails if you are familiar with those projects.

In another post (also via Wayback Machine since original inaccessible) he was “happy to announce that we have made Re:Greek an Open Source Initiative!” Unfortunately, as you’ll learn if you listen to his BibleTech talk (see above), the initial excitement was extinguished and the project transition failed. People who were used to the old zhubert.com were upset with the new Re:Greek project and just wanted the old site back. There weren’t enough developers and supporters to sustain the Re:Greek open-source project, and so it ceased. I don’t know all of the details, but the project was so thoroughly discontinued that Zack’s announcement blog posts have been removed and the ReGreek Google Group has been deleted. Zack seems to have washed his hands of the Re:Greek project and has thereafter focused all his efforts on The City (which is turning out to be a great success). Such a sad turn of events for such a brilliant idea!

At BibleTech this year, I am presenting the Open Scriptures project as “Picking Up the Mantle of the Re:Greek – Open Source Initiative”. We need to give Zack’s idea a second chance, but we need to learn from his experience. Open Scriptures will also fail if it does not have the collaboration and co-ownership of committed developers and the sponsorship of supporting organizations. However, if the Mozilla Foundation can make Firefox such a success, if the Linux community can persevere to make a solid open-source OS, and if Wikipedia is able to become the global first source for encyclopedia information, then surely there are people of faith—who believe their scriptures are the very words of God—who can make Open Scriptures a reality.

Will you join me?

Join the conversation at the Open Scriptures Google Group and spread the word.

The Tagged Tanakh and Semantic Linking

Update: Please also read James Tauber’s parallel Thoughts on GNT-NET Parallel Glossing Project.

A few days ago I posted to the group regarding the Tagged Tanakh project which, like Open Scriptures, is being presented at BibleTech at the end of this month. I had read their abstract a few months ago and was really interested, but only last week was their project unveiled. A month previously they teased:

Obviously, once the term “Web 2.0” was coined, Web 3.0 couldn’t be far behind. If Web 2.0 is about bringing individuals together via the Internet, then Web 3.0 is about bringing various sources of information together. At this point in time, no one has figured out a popular application of web 3.0 tools, but that doesn’t mean organizations aren’t trying. […]

The Tagged Tanakh is our effort to bring Torah into the Linked Data ecosystem that is emerging around us. Ensuring that our content is accessible and conforms to standardized and open structures is of paramount importance.

Reading this made me realize that Open Scriptures is a “Web 3.0” project and is a Semantic Web initiative. Furthermore, YAVNET pointed to the article “Torah 2.0: Old-Line Publisher Brings Biblical Commentary Into Online World“:

The Jewish Publication Society, a 120-year-old organization devoted to publishing ancient and modern texts on Jewish subjects, has begun work on a project to publish the Jewish Bible, or Tanakh, as an electronic, online text, integrating the original Hebrew with JPS’s English translation and selected commentaries. But the most radical part of the project is an ambitious plan to make the text of the Tanakh into an open platform for users of all stripes to collectively erect their own structure of commentary, debate and interpretation, all linked to the text itself. The publishers hope that the project will radically democratize the ancient process of Talmudic disputation by bringing it into cyberspace.

This is a really exciting project! Notice how similar it sounds to Open Scriptures. The fundamental concept in common is Linked Data. It sounds like they have a plan for linking commentaries and user tags, but that they don’t yet know how their going to link their English translation with the Hebrew text, as the article continues (emphasis added):

Though JPS has a fairly clear concept of how it wants the project to turn out, there are major obstacles to navigate. JPS must figure out how to link across the original Hebrew and the English translation, how to screen users to ensure that their comments are appropriate, how to integrate existing software tools, and how to create an architecture flexible enough to incorporate new and unexpected functions that may not yet even exist.

I’m convinced that the best way to link translations with their source texts is via the granular interlinking of the smallest corresponding semantic units. If the individual semantic units (e.g. words) in two texts are all individually addressable (by being stored individually in a database), then a data structure can be constructed wherein a cluster of semantic units in one text can be linked to the smallest equivalent semantic unit cluster in another text. This data structure has been fleshed out in the Open Scriptures database schema, and the Manuscript Comparator application is powered by manuscripts that have gone through this semantic linking process (though it is a simple example, since all the manuscripts are in one language, Greek).

The concept underlying this semantic linking is the dynamic equivalence method in translation, to quote Wikipedia: “The dynamic (also known as functional equivalence) attempts to convey the thought expressed in a source text (if necessary, at the expense of literalness, original word order, the source text’s grammatical voice, etc.)”. When finding equivalences between two texts in different languages, there are many cases when a single word in one language does not correspond to one single word in the other. For a example:

English: I like to study the Bible.
Spanish: Me gusta estudiar la Biblia.

The Spanish way of expressing “to like” is the idiom “to be pleased by”, so the equivalent of “I like” is “Me gusta” (a two-for-two correspondence). Likewise, to a lesser degree, “to study” and “estudiar” are both infinitives, but Spanish infinitives are formed with a suffix instead of with a particle like English. These sentences serve as a simple example, but also take for example very free translations like The Message which may be so paraphrased that individual words and phrases cannot be linked at all but rather whole sentences and paragraphs: the more free a translation, the longer the smallest common units of meaning; the more literal (word-for-word) a translation is, the shorter the smallest common units of meaning.

Therefore, for there to be a data structure that represents the semantic equivalences between these two sentences, there must be a many-to-many relationship that links individual semantic units for each smallest-unit of corresponding meaning in two texts. In the Open Scriptures database schema, the following entities are employed (see SQL and SchemaGraph):

token is an individual semantic unit (a word),
token_group represents a correspondence of meaning across texts,
token_group_cluster groups tokens from one text and associates them with a token_group,
token_group_cluster_token associates a token with a token_group_cluster (a bit verbose I know)

Now, with regard to the Manuscript Comparator, a shortcut is being taken since mono-lingual manuscripts are being linked together, and because of this, the semantic links are always one-to-one (and so the above schema entities aren’t required) and furthermore the links may be constructed simply with an algorithm which compares normalized strings of Greek text. However, to link texts from different languages, as Open Scriptures and the Tagged Tanakh projects are seeking to do, much more sophisticated techniques will be necessary to create the semantic links. Instead of creating a complex natural language processing algorithms (as Google Translate has done), the most straightforward way of creating the semantic links (in the spirit of Web 2.0) is to utilize collective intelligence. Greek and Hebrew students from around the world could flex their language muscles by making passes over the manuscripts and constructing semantic links with their favorite translations in their native languages. (Imagine if language professors assigned such a task as homework.) Each time someone makes a semantic-linking pass, the semantic links would improve as collective intelligence causes mistakes and bad data to become statistical outliers (and thus be ignored).

As I’ve mentioned previously, an old semantic linker prototype is available which serves as a simple proof of concept for how collective intelligence can interlink texts.

Kyle Biersdorff and I have talked about this a bit, and he wisely pointed out that there needs to be a way of storing semantic links that indicate not just equivalence but also variance. As we’ve discussed here the linking of the Masoretic Text with the LXX, there are many places where the LXX translates Hebrew loosely; it conveys some of the same meaning but also includes additional nuances. Take, for example, Isaiah 7:14 (ESV):

Therefore the Lord himself will give you a sign. Behold, the ____ shall conceive and bear a son, and shall call his name Immanuel.

In the blank, the Hebrew word (עלמה, almah) means “young woman” or “maiden” whereas the LXX word (παρθένος, parthenos) means “virgin”. This is a case where the LXX meaning is more specific than the Hebrew. I’m sure there are instances of the reverse too, where the LXX’s meaning is more generic. (It would be great to compile a list of these semantic variance link types.) In such a case, perhaps there could be an equivalence rating from 0.0 to 1.0, where 1.0 means an exact translation and 0.0 means complete variance; for virgin/maiden, perhaps a rating of 0.75 would be appropriate; when multiple people make passes over the data and provide ratings like this, we could then average them out to get a rating that is validated by collective intelligence.

I’m going to contact the Tagged Tanakh project and see if we can partner together and collaborate to avoid duplicate efforts.

Manuscript Comparator and the Open Scriptures Platform

(Originally posted on my personal blog.)

For the past several weeks all of my free time has gone into building the first application for Open Scriptures. For many months I had been working on designing the database and in December I finally got it to a point where it could store all of the necessary information so that application development could begin. The first application developed is the Manuscript Comparator. This application demonstrates what is possible when the semantic units of individual texts are linked together—when the interrelationships between semantic units are stored in a database and can be queried.

The database is constructed as follows: various manuscripts available on the Web today are each imported into the database individually, storing each manuscript’s word (token) separately with a unique identifier for each. After all of the individual manuscripts have been imported, they are then all merged together into a unified manuscript. The merging algorithm normalizes the text for comparison by removing all casing, diacritics, and punctuation; the unified manuscript stored in the database is composed of these normalized words. So the result of the manuscript merge is a unified manuscript which consists of every possible variant attested to by the contributing manuscripts; furthermore, all of the tokens in an individual manuscript are linked back to their corresponding words in the unified manuscript. Thus every manuscript is linked to every other manuscript by means of their links to a common point, the unified manuscript.

With the database of interlinked manuscripts constructed, the Manuscript Comparator is able to obtain the differences among manuscripts by querying the database for the requested manuscripts and joining them to each other and the unified manuscript. The results are presented in either a parallel (side-by-side) or unified view, with words highlighted according to whether they are “inserted” or “deleted”. (Read the introduction for more information regarding the user interface.) The unified view will serve as the foundation for the upcoming tool which will allow contributors to link the semantic units between manuscripts and translations (see an old prototype), and thus the links between translations via their common links to the unified manuscript. With such semantic links between translations in place, a Translation Comparator application will be possible which compares not the forms of the words in the translations (as is easily done today) but rather one which actually compares the translations based on their manuscript sources. For example, comparing the English King James version with the Spanish Reina Valera version would result in very few differences (if any) since they both rely on the Textus Receptus. Additionally, with the semantic links in place, it will also be able to compute the degree to which any translation relies on one manuscript over another.

The applications possible with this data are really exciting. Open Scriptures aims not only to be a “comprehensive open-source Web repository for integrated scriptural data,” but also “a general application framework for building internationalized social applications of scripture” which present data “in a translation-neutral and internationalized manner so as to be accessible to the community no matter what language they speak or version they prefer.” Inspiration for this framework comes from the Facebook Platform which provides an API enabling web developers to create applications powered by Facebook’s social network data. What if we had a similar platform and framework which enabled web developers to easily build applications which are powered by interlinked scriptural data? What if these applications were hosted on the Cloud as with Google App Engine? These ideas about a scriptural web application platform have really been exciting me, but they haven’t started cooking yet. The ingredients are only just now being gathered… please join me!

Discussing Open Scriptures

(Originally posted on my personal blog.)

I’ve created a discussion group for Open Scriptures where we can collaborate on the development of the project. I’ve written a few posts relating some thoughts I’ve had about the project over the past couple days:

Please join me!

Open Scriptures at BibleTech

(Originally posted on my personal blog.)

For the past several years, I’ve been dreaming about an open source community-driven Web application for Scripture. In the past few months, things have really been kicking into high gear. At BibleTech:2009 I’m presenting the project in the talk Open Scriptures: Picking Up the Mantle of the Re:Greek – Open Source Initiative:

Open Scriptures seeks to be a comprehensive open-source Web repository for integrated scriptural data and a general application framework for building internationalized social applications of scripture. An abundance of scriptural resources are now available online—manuscripts, translations, and annotations are all being made available by students and scholars alike at an ever-increasing rate. These diverse scriptural resources, however, are isolated from each other and fragmented across the Internet. Thus mashing up the available data into new scriptural applications is not currently possible for the community at large because the resources’ interrelationships are not systematically documented. Open Scriptures aims to establish a scriptural database for interlinked textual resources such as merged manuscripts, the differences among them, and the links between their semantic units and the semantic units of their translations. With such a foundation in place, derived scriptural data like cross-references may be stored in a translation-neutral and internationalized manner so as to be accessible to the community no matter what language they speak or version they prefer.

Think of it as a Wikipedia for scriptural data. Just as Wikipedia has become the go-to place to find open encyclopedia information, Open Scriptures seeks to be the go-to place for open scriptural data. (Non-free data could also be stored, but it would be restricted to non-commercial personal use, as Wikipedia does with fair use or by obtaining special permission.)

Interested? The project needs you! I’d love for a core group of scholars and developers to come together with the shared vision of open access to scriptural data employing open standards and best practices of the Web.