Scripture and database models
At the moment the Open Scriptures project is working on developing an internet API for querying scriptural text and metadata. The basic task is to create “a common API for many datasets.” However, before the API can be implemented, the underlying relational database models must be established. To that end, Weston has been working on implementing database models for the API using Django (the development platform for Open Scriptures).
One of the most challenging aspects has been finding out how to record structural information about the text: verses, chapters, title headings, etc. There’s also been a desire to not rely on any particular structural marker in the database’s organization. So the base unit for storing the text is not a chapter or verse, but what is called a “token”. A token is comprised by one of the three atomic structures of a text – word; punctuation; whitespace. Of course, there may be cases where even the basic token can be split, but you’ve got to start somewhere.
To provide structure, Weston has written a token linkage system, where you can define a certain structure (e.g. “Verse 12″) and, using the features of a relational database, connect it to the tokens which should be included in that structure. There is even a feature for non-linear token linkages, if anyone finds a use for that.
Another piece of the puzzle is deciding how to express various types of metadata about the text in the database. One important type of metadata is the parsing information for Greek or Hebrew tokens. That parsing information could be provided in a simple string (e.g. “verb PAI3S”) which client applications would then have to interpret based on established conventions. This is not ideal, however, since it would seriously hamper the querying power of the API. It is best to instead use a database model for parsings. The challenge here comes in supporting multiple languages. Once again, relational database features will assist with this problem, since we can assign one Greek or Hebrew (or any other language) parsing to each token’s metadata. If there is a difference of opinion on a parsing, we can even store multiple parsings for each token.
I am optimistic about the potential of this project. Once the API is nailed down, there will be a lot of great opportunities for “client” apps, using whatever framework they wish. Until then, the API has to be finalized and garnished with built-in methods, and the models have to be tested with real data (which requires that the data be ported to the models in the first place). This is where we can use help from all sorts of people, from Python programmers to database experts to linguists and biblical scholars. It’s a good time to be interested in the scriptures and open source software.
Open Scriptures Roundup – January 1, 2010
Recently, there has been a surge of posts on Open Scriptures. If you haven’t been able to follow them all, here are some of the most exciting threads.
- The Open Scriptures code is moving to GitHub, from Google Code Project Hosting. See our page http://github.com/openscriptures.
- The Open Scriptures website is being redeveloped in Pinax. This is a major change which will allow us to define our projects more clearly as well as become the platform upon which we can build our applications. We are looking to abandon Google Groups. See the Openscriptures.org Redevelopment thread for more information. Special thanks to James Tauber at Eldarion for his generosity in hosting and setting up our new site! We’ll be announcing the beta site soon.
- New group activity kicked off by post, Standardizing on a Web Infrastructure and Web Service API for Scripture, sent to Bible Technologies Group, Crossway, CrossWire, Bible.org, and others. We discussed work on a new edition of OSIS. We are actively talking with Bible.org (NET Bible) and the Crossway (ESV) about ways to promote openness and interoperability.
- We’ve started some excellent dialogue with the Open Siddur project and their lead developer Efraim Feinstein.
- Reminder about preserving the right for a worker to receive his wages.
- The group decided on a common license for our projects, Creative Commons BY-SA. The bulk of the discussion may be found under WLC Lexical Tagging.
- We discussed our channels of communication in the Regular Meetings over IRC or Google Wave thread. The unofficial conclusion is that we do need to have some regular real-time meetings. However, the new Pinax site will also allow for focused collaboration that may alleviate some of the barriers we are experiencing.
- We also decided to use the name “MorphHB” for an open project collecting the morphology of the Hebrew Bible. The thread Decision about how to name the project – Morphology and the WLC has more information.
- We discussed word-level linking between texts.
- We are also now nailing down our the goals for the project, and will be releasing a more refined project description soon.
- We also discussed the naming of our project. The “slug” we used on Google was “open-scriptures” but we are standardizing on “openscriptures”, as on Twitter and GitHub.
There are many more important threads that I will leave for your perusal at our Google Groups site. Over the course of the next few weeks expect to see some more explanations of what we have been discussing. Please feel free to join in the conversations!
Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2009 | 13 | 31 | 193 | 136 | 66 | 50 | 49 | 15 | 3 | 24 | 47 | 267 |
December | |
---|---|
51 | Weston Ruter |
46 | DavidTroidl |
23 | Daniel Owens |
19 | sceptreofjudah |
17 | jtauber |
14 | Efraim Feinstein |
12 | Chris Little |
11 | bydesign |
11 | JAG3773 |
9 | Rob |
Open Scriptures Roundup – July 3, 2009
The exciting news this week is the dialogue that Weston has been carrying on with Dr. Bertram Salzmann from the German Bible Society. In a nutshell, we are working together to create a developer platform that will give access to the copyrighted texts that GBS maintains (such as the renown UBS GNT) along with other openly-licensed works already available online. The conceptual outline that Dr. Salzmann has proposed keeps GBS’s texts under their umbrella by means of hosting the texts and the applications that make use of them. This is somewhat different than the original idea that Weston proposed in which Open Scriptures would be more of a true mediator between open source developers and content providers like GBS. In any case, the applications would be made available free of charge. The exact details have yet to be figured out. Many thanks to Dr. Salzmann and GBS for their innovative forward-thinking proposal! Please help by joining in on the conversation!
Open Scriptures Roundup – June 26, 2009
The past few weeks on Open Scriptures have seen some steady progress. Of note is that the Tregelles GNT import script is near completion. The addition of this manuscript will be much appreciated as it will bring the total number of works in the Manuscript Comparator to six. In addition to New Testament improvements, David Troidl submitted the initial upload of Strong’s Hebrew data (XML). This first step is the outcome of very hard work and also good collaboration on our Google group. Lastly, we are working on porting all of our code into a Django/Pinax friendly format so that we can switch our site as well as our applications over to this platform. The progress here is moving along and within a couple weeks we should be on our new server using our new platform graciously donated by James Tauber of MorphGNT and Eldarion. If anyone has experience with Django/Pinax and would be willing to help out please contact Weston via the Google Group.
There has also been some significant steps forward in the dialog between Weston and the German Bible Society; GBS has a tentative proposal which looks to be quite promising and beneficial for both communities. Look forward to an announcement hopefully next week.
Open Scriptures Roundup – June 5, 2009
Welcome to Open Scriptures Roundup! This is the first instalment of what will be a weekly synopsis of what’s been going on at Open Scriptures. The goal is to keep everyone who is interested updated on the status of the projects we are working on as well as provide information that can point out where help is needed.
This week we have made a lot of exciting progress. For starters, the manuscript import and build scripts have been completely ported over to Python and tested! This is a huge first step as it now allows us to work on creating applications that manipulate these manuscripts. Weston is currently working on creating a Django app of the Manuscript Comparator. If you would like to checkout the code follow the steps listed here.
Tregelles’s Greek New Testament Released! Joyfully, we can report that the project has released the texts under the Creative Commons 3.0 Protocol (that is, CC BY-NC-SA). This is very good news for Biblical Studies and especially for Open Scriptures. The initial announcement is here. The downloads may be accessed here. Weston is working on updating the import and merge scripts to include these texts.
Lastly, there has been some very productive collaboration going on regarding Strong’s data. For more information see the thread here, and another thread here.
To keep yourself abreast of all the news going on join the Open Scriptures Google Group.
See ya next week!