Who Cares About Raw Data?
One of the core ideas upon which Open Scriptures is based is open access to raw data. This concept was introduced (not initially, but widely) by Tim Berners-Lee in his TED talk, “Lee on the next Web.” The recurring phrase throughout this talk is “raw data now.” Coupled with this idea is the notion of linked data, sometimes called “Web 3.0.” I here set out to explain why these concepts matter to Open Scriptures.
So what is open access to raw data and who really cares? To the average user of the internet raw data is both trivial and essential. It is trivial mainly because raw data by itself is not terribly interesting or useful. However, raw data is absolutely essential because it is what drives the most popular websites in the world. The key is how the raw data is linked together.
A very good analogy is that of a research paper. When one sets out to write a detailed research paper, a first step is to collect information. Often this is a very lengthy process, involving many hours online and in the library reading articles, books, and anything that pertains to the paper topic. A common technique for keeping track of all of this information during this stage used to be 3 X 5 cards, but I think it is safe to say that there are computer programs that do a much better job today, e.g. Zotero. Once this information gathering phase is finished, the writer has a formidable amount of raw data. Yet, as mentioned above, this raw data is not particularly useful. If the writer were to simply submit all of these separate pieces of information to the publisher/teacher/newspaper the paper would clearly be rejected. The reason: raw data needs to be linked in meaningful ways.
This is where the second part of the writing process comes into play, actually writing. The author takes all of the raw data that was collected and he or she sets out to tie it all together into a meaningful piece of literature. Ideally, the finished product will contain most of the raw data but the paper will clearly demonstrate how each piece of information is related to the others and, perhaps most importantly, how each piece of information supports the writer’s thesis statement.
To the point, raw data is the essential first step in the process of presenting information in meaningful and helpful ways. Thus, even though most web users do not seem to care about raw data, in reality, they actually care a great deal. Content providers need to put their raw data online in a way that is accessible to developers so that they can do their job creating applications that make the data useful for the rest of the world.
Open Scriptures is committed to fostering the development of raw data on the internet so that developers will have access to the data that they need to create great web applications! For an example of how raw data (manuscripts) may be linked together to create helpful web applications, see our Manuscript Comparator.
This only scratches the surface. There is much more to raw data and especially to linked data than what is presented here. For more information see http://www.w3.org/DesignIssues/LinkedData.html and look forward to another post detailing linked data.
The Directory and Fostering Collaboration
I have noticed in the Open community that we lack efficient communication and project visibility. Maybe communication is just difficult to begin with, and I bet we do better than purely proprietary top-secret enterprises. But in any case I’ve made the mistake in the past of setting out in isolation to work on some exciting, worthwhile project, but once I stopped and looked around at the community, I’d likely find others around me who are actively working on or who have already completed the very thing I wanted to do! Just think how much more could be done if instead of independently working on parallel projects we came alongside each other to consolidate efforts. In order for this to happen, we have to be able to know what each other is doing, and this is why we have recently launched the Open Scriptures Directory, which seeks to comprehensively list open projects involving scriptural data.
In addition to the Directory, the Google Group is also serving as a way for people to collaborate. This week there has been an active thread discussing collaboration on Strong’s Dictionary data. David Troidl expressed the general sentiments well:
The idea of collaboration is great. I’ve already mentioned to Darrell that just the moral support is heartening. It’s nice to know, after all this time working alone, that there really are others out there who share the same interest. […] I’ve been on my own all this time
I think it is essential to seek out others who are interested in the same things so that we can come together not only to build upon what each other has done, but to consolidate what we are doing as much as possible so that both duplicate efforts and fragmented data can be avoided. Not only will this lead to more productivity, but it will also serve to build community among those who share a common vision for open scriptural resources.
If you have a project that is not yet listed in the directory, please suggest the link. Please also join the Google Group!
Synopsis of Open Scriptures at BibleTech:2009
BibleTech:2009 was a success! The conference was a great opportunity to learn about the cutting edge developments at the intersection of Bible and technology, and to meet the people behind them. I’ll summarize some of the connections that relate to Open Scriptures.
I finally got to meet the people behind the Tagged Tanakh project from the Jewish Publication Society; they presented in the talk “How the Ancient Rabbis Invented Web 2.0 Before Its Time.” When I learned about their project earlier this year, I got really excited because the Tagged Tanakh’s vision is almost identical to that of Open Scriptures except for its focus on the Jewish scriptures. JT Waldman is directing the project, and we really connected both personally and professionally. We immediately began collaborating and strategizing on how we can assist each other and work together to realize the common vision.
Bible.org was represented at the conference, and they also want to work together. Bible.org has had a very open policy with regard to licensing their New English Translation (NET), and they are looking for new ways to make it even more openly accessible. Open Scriptures is one such avenue, in addition to their existing NET Bible study tool. The NET Bible is a solid translation that includes a multitude of scholarly notes which will profoundly benefit resources that are integrated into the scriptural Semantic Web of Linked Data (again, see Tim Berners-Lee’s TED Talk).
Stephen Smith presented on “The Need for a Universal Bible Annotation Format.” Stephen, previously employed by Crossway as the developer of the ESV Online, is now employed at Zondervan and is working to bring the same openness he built into ESV Online to be taken to a larger scale at Bible Gateway, currently the most popular Bible website. Stephen is also the voice behind OpenBible.info where he, obviously, promotes openness of scriptural data. In his talk Stephen described a data format that would enable not only the portability of users’ data around the Web and among their devices, but it would also allow web apps to integrate scriptural data from across the Web, making possible new applications powered by this data. His talk examines the requirements for such a system and it outlines possible ways to implement it. He announced that the Open Scriptures Google Group would be where collaboration on such standardization would take place (see threads one and two).
Sean Boisen of Semantic Bible, who last year presented on Bibleref, this year presented on the Bible Knowledgebase (BK). The project’s goal is to identify and mine all of the people, places, and things in the Bible and to connect them all together into Linked Data. While this is a proprietary product of Logos, there is hope that the identifiers (URIs) used will be published so that the community can standardize on a common namespace.
And lastly, of course, I presented the Open Scriptures project itself, and I am thankful for how well it turned out. The multimedia from the talk is available.
Multimedia of Presentation at BibleTech:2009
The BibleTech:2009 conference went really well! I presented at 11am on Saturday. My talk is available in a three formats:
The MP3 audio I recorded on my laptop during the talk; I’ll update this post with the official conference audio if its any better.
SlideShare didn’t convert the Keynote presentation perfectly (animations were lost, for example), so you can download the original if you would like (will not work in Microsoft PowerPoint).
Video
Slides
I’m working on a post that summarizes the important connections I made at the conference, and I hope to have that up here soon.
I’m eager to hear your comments and feedback regarding my talk.
Audio of Presentation at Multnomah University
Yesterday I presented the Open Scriptures project to Dr. Karl Kutz, professor of Bible and Biblical Languages at Multnomah University. Joining the presentation to Dr. Kutz were my wife and LeRoy Lee, who is the webmaster for Multnomah and also a member of the Open Scriptures group. The audio is available (35 mins). Enjoy!