Open Scriptures logo: an open BibleOpen Scriptures

Platform for the development of open scriptural linked data and its applications. Moreā€¦

Language Codes

Language Codes

Any Bible organisational system has to have a way to define languages. We’ll be dealing with Biblical languages like Hebrew, Aramaic, and Greek, as well as the many languages of translations.

Many systems have used 2-letter codes like ‘en‘ or ‘fr‘ for naming languages. But 2-letter codes have a maximum of 26 squared possibilities, which equals 676 — a whole order of magnitude short of being able to represent the some 7,000 languages of the world. And since we want the Bible to reach all peoples of the world, we want to include their languages from the beginning.

So in a truly international system, we’ll need to use codes of at least 3-letters. (Yes, 17,576 codes should be enough!) And the ISO 639-3 standard, which came originally from the Ethnologue and is currently still administered by SIL gives us that.

Today I committed Python code to access the ISO 639-3 information. It’s just a little foundational step that’ll be needed later on.

[I’m not sure yet that it handles everything we need — for example Americans and New Zealanders both speak English (code ‘eng‘) but we choose, pronounce, and spell many words differently. So we might be needing to extend this language code system sometime when we get into spell-checking and such.]

Comments

  1. Weston Ruter

    Regarding American vs. New Zealander dialects of English, can’t this be represented in ISO language code via adding a country subtag like (ISO 3166-1 alpha-2)? But it seems that only IETF language tags allow the inclusion of country code subtags, like en-US and en-NZ. But does ISO or IETF standardize the combination of 3-letter codes with country codes?

  2. RobH

    @Weston: Yes, appending a two-letter country code like -US and -NZ seems sensible but I don’t believe that’s an actual standard when combined with the three-letter ISO 639-3 code. [Then to make it even more complex, some minority languages write in alternative scripts depending on the major languages of influence around them, e.g., the same materials might be published with both a Romanised script and an Arabic script.]

Subscribe to the comments feed.