Explaining Open Scriptures | Open Scriptures

What Good Is Linked Data?

Note: This is a conceptual overview, for a technical look, see here or here.

To follow up my previous post concerning raw data I thought it would be good to give a discussion to linked data. First of all, it must be emphasized that linked data cannot exist if there is not access to raw data. So “raw data now,” then linked data.

This whole notion of linked data is really the idea of making data useful, really useful. At a high level, a good example of linked data is Wikipedia. In particular, take a look at this article about BSD. As I type that sentence, I realize that many do not know that BSD stands for Berkeley Software Distribution. Nor would many others guess that BSD happens to be the precursor to many flavors of operating systems, among them FreeBSD, NetBSD, MAC OS X, DragonFlyBSD, etc. The point that I want to extract from the Wikipedia article is that there is a plethora of information in the article but there is also a plethora of links that one can access through the article. Thus, if one wanted to learn about FreeBSD, the Wikipedia BSD article already has a link to it. Further, one could read the FreeBSD page and find a nice graphical derivative, PC-BSD. Without the basic implementation of links, these correlations would be much more difficult to come by.

On the internet, linking is the way to go. If we zoom in a little bit, we may notice some interesting features of linking. Let’s stick with Wikipedia, their main page boasts 27 different languages, impressive. Now, go ahead and return to our BSD article and on the bottom left select another language. Now you have the same information, yet in a completely different language. The data on the German page and the data on the English page should conceptually be the same information, yet because it is presented in those two different languages the article is now much more useful to many more people. Now multiply that by 27 and it is very easy to see why Wikipedia has gained incredible worldwide appreciation. How many languages can you get Encyclopedia Britannica in?

Alright, so those examples deal mainly with information in the form that we are used to seeing online, web pages. What happens when we take a look at data itself? Tim Berners-Lee uses census data as an example in his TED talk, but I thought it would be more interesting to look into Scriptural data. In the field of Biblical Studies we have a lot of manuscripts. What we don’t have is a lot of easy access to those manuscripts nor easy methods to compare. However, that is changing! As more of these manuscripts become available online (see these projects) we have the ability to link them together. The Manuscript Comparator is a prototype of this linkage. What the prototype accomplishes is systematically linking the data found in the manuscripts for simplified and complete comparison. Sure, someone could get hard copies of each manuscript and manually compare them. But anyone who has done ancient language study will surely appreciate the beauty and simplicity of this application. To simply type in the passage that one is studying and then be able to easily view discrepancies is a huge resource! Not only that but it demonstrates the power of linked data.

This is only the beginning for Biblical Studies, if you want to see what the collective mind of Open Scriptures dreams about when we consider linked data, check out the Potential Applications page.

Who Cares About Raw Data?

One of the core ideas upon which Open Scriptures is based is open access to raw data. This concept was introduced (not initially, but widely) by Tim Berners-Lee in his TED talk, “Lee on the next Web.” The recurring phrase throughout this talk is “raw data now.” Coupled with this idea is the notion of linked data, sometimes called “Web 3.0.” I here set out to explain why these concepts matter to Open Scriptures.

So what is open access to raw data and who really cares? To the average user of the internet raw data is both trivial and essential. It is trivial mainly because raw data by itself is not terribly interesting or useful. However, raw data is absolutely essential because it is what drives the most popular websites in the world. The key is how the raw data is linked together.

A very good analogy is that of a research paper. When one sets out to write a detailed research paper, a first step is to collect information. Often this is a very lengthy process, involving many hours online and in the library reading articles, books, and anything that pertains to the paper topic. A common technique for keeping track of all of this information during this stage used to be 3 X 5 cards, but I think it is safe to say that there are computer programs that do a much better job today, e.g. Zotero. Once this information gathering phase is finished, the writer has a formidable amount of raw data. Yet, as mentioned above, this raw data is not particularly useful. If the writer were to simply submit all of these separate pieces of information to the publisher/teacher/newspaper the paper would clearly be rejected. The reason: raw data needs to be linked in meaningful ways.

This is where the second part of the writing process comes into play, actually writing. The author takes all of the raw data that was collected and he or she sets out to tie it all together into a meaningful piece of literature. Ideally, the finished product will contain most of the raw data but the paper will clearly demonstrate how each piece of information is related to the others and, perhaps most importantly, how each piece of information supports the writer’s thesis statement.

To the point, raw data is the essential first step in the process of presenting information in meaningful and helpful ways. Thus, even though most web users do not seem to care about raw data, in reality, they actually care a great deal. Content providers need to put their raw data online in a way that is accessible to developers so that they can do their job creating applications that make the data useful for the rest of the world.

Open Scriptures is committed to fostering the development of raw data on the internet so that developers will have access to the data that they need to create great web applications! For an example of how raw data (manuscripts) may be linked together to create helpful web applications, see our Manuscript Comparator.

This only scratches the surface. There is much more to raw data and especially to linked data than what is presented here. For more information see http://www.w3.org/DesignIssues/LinkedData.html and look forward to another post detailing linked data.