02.10.2023

DEFINING THE CAUCASUS: PROBLEMS OF CONTROLLED VOCABULARIES

by James Baillie

Fig. 1 Main geographical areas of the Caucasus region © J. Baillie 2023

I was in an academic bookshop recently on a trip back to the UK, searching through the history section. I was looking for books relating to my subject area, the medieval Caucasus – but they weren’t all to be found together, or even in all the places one might expect. The Byzantines, who ruled much of the Caucasus in the early medieval period, were included in Medieval History, along with books on western European contexts. The Ottomans who later replaced them as Anatolia’s primary power and dominated parts of the Caucasus in the early modern period, on the other hand, were in the Middle East section, as part of a wider category of ‘Area Studies’. The Caucasus itself was also in ‘Area Studies’, but within a subsection on Russia and former Soviet States. Area Studies also included a subsection for Africa, in which a singular volume caught my eye on ‘Medieval Africa’, a study that ran up to the year 1800.

What did that bookshop’s organisation mean for how customers saw those books, and the associations they made about them? Does the advent of Ottoman power remove places from what we should see as ‘European’ history? Is it right to see early modern Africa, adapting to and suffering under pressures and forms of colonialism from increasingly modern nation-states, as ‘medieval’, especially given the connotations of that term in public discourse? Is connecting the history of the Caucasus intimately to that of Russia, its former colonial power, a fair categorisation when for most of human history those regions had far livelier interactions with the Iranian and Anatolian worlds?

How we categorise things, and how we imagine the connections between those categories, matters. It shapes how we conceptualise the world more broadly, where we look for links and comparisons between topics, and not just the answers we find but the questions we ask to begin with about the past.

In recent years, those category problems have become more complex as new technologies have increased our ability to model information in more flexible ways. Physical bookshelves or filing cabinets were once the norm for finding information, with a single, static reference system. Whilst connecting knowledge across multiple dimensions could be done before computers, maintaining complex systems of inter-referential cards was an exercise few had time for. Now, via keyword searches that can pick out varied categorisations or even generate them in the moment from the text of documents, we all seek information using complex multi-dimensional categorisation systems almost every day of our lives.

The digital humanities thus force us to examine how we categorise information. Sometimes, when people think of humanities computing, they focus on large-scale data gathering and processing. The revolution in modern systems for storing and conceptualising information can, however, matter at least as much. This is something I see directly in my role at the Institute of Iranian studies. I’m a project staff member working on the Caucasus Digital Repository (CDR) project, constructing a long-term online storage and cataloguing system for digital-format humanities materials relating to the Caucasus region. In particular, we need our ‘shelving’ system for classifying the resources we’re including in the project. This means constructing a controlled vocabulary – a limited, clearly defined set of terms that can be used for people to find the information they want. That, in turn, leads to a wide array of categorisation puzzles with no easy answers.

Even defining the geography of the Caucasus raises immediate questions. Most people understand it as a broad sweep including the areas north of, south of, and between the Greater and Lesser Caucasus mountain ranges [Fig. 1]. Some Armenians, however, object to the inclusion of the Armenian highlands, south of the Lower Caucasus range, in a ‘Caucasus’ terminology that they see largely as a colonial Russian imposition. Abandoning the term entirely, conversely, can lead to carving up the region based on modern national distinctions, which has its own problems. The geography of the Caucasus is still politically contested, most notably in Artsakh/Karabakh, South Ossetia, and Abkhazia, but most of the region has moved between polities and mixed ethnic groups in complex ways.

Time adds further complexity. Modern terms may not apply well and historical ones may be hotly contested, especially where states contest claims to particular historical polities or cultures. Periodisation itself can be contested: scholars in the Caucasus often use a ‘long medieval’ concept, which comes from Soviet theories that suggest the ‘medieval’ period ended only when serfdom did, to refer to everything until around 1800 as medieval. This, however, risks portraying early modern Caucasia as in some way archaic compared to Europe, as we saw with the ‘Medieval Africa’ volume in the bookshop mentioned earlier.

Cultural and political boundaries can pose even more questions. Most individuals have overlapping identities, including their locality, nationality, ethnicity, and religion. These will vary in prominence, foregrounded (or not) depending on social contexts and circumstances. It’s necessary to use these categories because they’re often key to answering historical questions, like how particular political or social categories functioned or how they developed into the forms we see today. But would a book on, say, a Georgian living in Armenia be Georgian history, or Armenian history? Probably both, though it matters whether and how we distinguish in our controlled vocabulary between Georgian culture and Georgia as a country. Whether regional identities imply national identities is another complex issue: most people would agree that for example someone from the east Georgian region of Kakheti is by definition a Georgian. Kakheti was, however, a separate kingdom in the eleventh and early twelfth centuries, and then from the fifteenth century was independent again for much of the early modern period as either an Ottoman or Persian vassal. Might there then be historical cases where some of these regional identities shouldn’t imply the national or political identity they’re associated with today, in a cataloguing context, or could that wrongly impose overly large splits within a cultural group that shared many key traits of language and religion?

When building structured solutions to these problems, the technologies we use can matter too for what options are available. For example, the CDR’s software allows us to “nest” category terms such that one implies another – so ‘Georgia’ can be nested under ‘Caucasus’ for example. But this structure is required to be a tree: that is, each item can only nest under one other item. For example, take the city-category ‘Tbilisi’, which is inside the category ‘Georgia’. It could also, however, be in the category of ‘Kura river valley’… but the Kura category cannot also be inside ‘Georgia’ given that the Kura rises in Turkey before running through Georgia and then Azerbaijan [Fig. 2]. This gives us some restrictions when defining overlapping categories. There is a trade-off here: not allowing such overlaps can simplify the system and make it easier for both users and programmers to work with. But these technical limits also shape how we can relate concepts to one another in such a system, and thus how that system shapes our searches and connections.

Having seen how difficult these problems are, there’s one final twist to come. Building a good concept framework for a specific project is one problem – but for a repository like the one I’m working on, the controlled vocabulary needs to be accessible for public use. Like in shelving a book or writing an index, the question is not how the author, but how the readers, will approach the system. In these cases, having a well thought out framework is only one part of the problem: we also need an intuitive framework. Using non-standard definitions or those only known to people who are already experts in the exact subject matter is a problem if it means others can’t find the material they need for their research. Even if it were possible to, for example, catalogue everything with a descriptor of the physical geography that avoided discussing national lines, such a system would be impossible for someone who simply needed to find out about Georgian, Azerbaijani or Armenian history. This requires difficult compromises, and choosing where such a system should align with the most conventional popular views of a period for useability and where it should push for a more fine-tuned approach even if that asks more from the user.

This post offers more questions than answers, but they’re questions that matter for how we access information, be that in a general-purpose bookshop or a Caucasus-focused data repository. Controlled vocabularies and categorisation systems are unavoidable, necessary, but need to be thought through and challenged rather than treated as neutral tools. The advent of digital technologies has increased the potential complexity of information systems. This can help us find information, but also poses new challenges for how we organise it to balance academic rigour and a sensitive, detailed approach to cultures with the practical necessity of being able to find information. How well we can achieve that might have profound implications for the information we access, the questions we ask, and the understandings we reach about the past.

Name	Purpose	Storage duration	Type	Provider
CookieConsent	Remembers your consent to the use of cookies.	1 year	HTML	Web Consent
fe_typo_user	Assigns your browser to a session on the server. This only affects the content you see and is not evaluated or processed by us	-	HTTP	Web User

Name	Purpose	Storage duration	Type	Provider
_pk_id	Used to store a few details about the user like unique visitor ID.	13 months	HTML	Matomo-id
_pk_ref	Used to store information about the user's referring website.	6 months	HTML	Matomo-ref
_pk_ses	Short-term cookie to save temporary data from the visit.	30 minutes	HTML	Matomo-ses
_pk_cvar	Short-term cookie to save temporary data from the visit.	30 minutes	HTML	Matomo-cvar
_pk_hsr	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo

Name	Purpose	Storage duration	Type	Provider
YouTube	A connection to YouTube will be established to view videos.	-	Connection	YouTube
SoundCloud	A connection to SoundCloud will be established to play audio files.	-	Connection	SoundCloud
Twitter	A connection to Twitter will be established to display tweets.	-	missing translation: type.	Twitter

DEFINING THE CAUCASUS: PROBLEMS OF CONTROLLED VOCABULARIES

Kontakt

Institut für Iranistik