Scripture and database models

At the moment the Open Scriptures project is working on developing an internet API for querying scriptural text and metadata. The basic task is to create “a common API for many datasets.” However, before the API can be implemented, the underlying relational database models must be established. To that end, Weston has been working on implementing database models for the API using Django (the development platform for Open Scriptures).

One of the most challenging aspects has been finding out how to record structural information about the text: verses, chapters, title headings, etc. There’s also been a desire to not rely on any particular structural marker in the database’s organization. So the base unit for storing the text is not a chapter or verse, but what is called a “token”. A token is comprised by one of the three atomic structures of a text – word; punctuation; whitespace. Of course, there may be cases where even the basic token can be split, but you’ve got to start somewhere.

To provide structure, Weston has written a token linkage system, where you can define a certain structure (e.g. “Verse 12″) and, using the features of a relational database, connect it to the tokens which should be included in that structure. There is even a feature for non-linear token linkages, if anyone finds a use for that.

Another piece of the puzzle is deciding how to express various types of metadata about the text in the database. One important type of metadata is the parsing information for Greek or Hebrew tokens. That parsing information could be provided in a simple string (e.g. “verb PAI3S”) which client applications would then have to interpret based on established conventions. This is not ideal, however, since it would seriously hamper the querying power of the API. It is best to instead use a database model for parsings. The challenge here comes in supporting multiple languages. Once again, relational database features will assist with this problem, since we can assign one Greek or Hebrew (or any other language) parsing to each token’s metadata. If there is a difference of opinion on a parsing, we can even store multiple parsings for each token.

I am optimistic about the potential of this project. Once the API is nailed down, there will be a lot of great opportunities for “client” apps, using whatever framework they wish. Until then, the API has to be finalized and garnished with built-in methods, and the models have to be tested with real data (which requires that the data be ported to the models in the first place). This is where we can use help from all sorts of people, from Python programmers to database experts to linguists and biblical scholars. It’s a good time to be interested in the scriptures and open source software.

Comments

Scripture and APIs | The Library Basement

September 6th, 2010 at 12:59 am

[…] a perfect storm of personal interest for me. Note: This post has been adapted and cross-posted on the Open Scriptures blog. Add new […]
Robert

November 1st, 2010 at 5:42 am

I’m working on a few ideas for a relational database along similar lines to what you’re talking about here, but on a much more limited scale and with a different focus. I found your reference parser in hopes I could use it but I can’t find the code on github (no longer on Google Code, right?) Can you point me to it?

Subscribe to the comments feed.