Thoughts and Problems while Converting and Indexing Multilingual Sentences of Abdulrazak Gurnah’s novel Afterlives

In the class „Writing across Languages: The Post-Monolingual Anglophone Novel“ we started working with digital tokenization, tagging, indexing and later annotating in order to – very broken down – take a look at how digital softwares react to multilingualism in novels. As most softwares and programmes are made in English-speaking countries for the English-speaking market and are hence almost exclusively in English, we are interested in how they perceive and annotate non-English words and phrases. Does their anglocentricism provide us with problems or will they actually understand non-English and annotate it correctly? (Small spoiler: they don’t.)

In my case I worked with Abdulrazak Gurnah’s novel Afterlives in which multiple languages are part of the primarily English narration. I had no problems with any of the technical aspects of this step, so after putting my example sentences into the provided Google Collab template (on the basis of Jupyter Notebook which just was too difficult to install), these are my main findings:

  • Our assumption that it defines all non-English words as proper nouns did become almost exclusively true – I think there were only a a couple of examples where it identified them differently
  • Sometimes the ROOT of a sentence was very weirdly placed.
  • Punctuations are not seen as separate tokens/entities in the dependency tree.

Here are some examples:

She wrote: Kaniumiza. Nisaidie. Afiya. He has hurt me. Help me.

In this example both kaniumiza and nisaidie are declared as proper nouns while kaniumiza is a direct object and nisaidie a ROOT. Afiya is also a proper noun and a ROOT, which makes sense as it is a name and the only part of this one-word sentence. However, the others do not make much sense, especially as the direct translation is given afterwards. I could understand all of them being a ROOT, but I just don’t understand why kaniumiza is seen as a direct object. It’s also unfortunate that the programme does not seem to see the whole example as an entity in which the sentences correlate with each other on a semantic level, but only sees it per individual sentence. Because if this were different, it would identify He has hurt me. Help me. as the translation of Kaniumiza. Nisaidie.

’Jana, leo, kesho,‘ Afiya said, pointing to each word in turn. Yesterday, today, tomorrow.

This one confused me a lot: Why is kesho a nouns while the others are proper nouns? Also jana is seen as a nominal subject, for which I only have the explanation that the programme thinks it is the name Jana and not a word in another language. However, how come leo is a conjunction and kesho a direct object? All of them should be indexed the same. I also do not understand why we suddenly have no ROOT at all in these three words, while this happened with other examples. Additionally, this time there was also something I don’t quite understand in the indexing of the English words: Why is tomorrow identified as the ROOT and not one of the other words? It is also quite sad that this time, the programme – again – did not realise that the direct translation of the non-English is part of this example.

After the third miscarriage in three years she was persuaded by neighbours to consult a herbalist, a mganga.

This one surprised me a bit. Because not only is mganga seen as a noun, but also as an appositional modifier, meaning that the programme realised it is another word for herbalist. This is the first – and only – time that an example of the African language (I’m very sorry, but I just could not discern which of the 125 Tanzanian languages it is) is indexed correclty.

I then thought that maybe it would be different with another language and tried this example:

They were proud of their reputation for viciousness, and their officers and the administrators of Deutsch-Ostafrika loved them to be just like that.

However, even with German and the official name of a region (Deutsch-Ostafrika) there were problems. Though both parts of the word are seen as proper nouns – which is correct – only Deutsch is seen as a compound, while Ostafrika is seen as the object of a preposition. This is not necessary incorrect, however, Deutsch-Ostafrika is one word, even if it is hyphenated. Hence, in my understanding, both parts of the world should be seen as a compound and they together as the object of a preposition. 

And lastly, another example with German: 

He looked like a … a schüler, a learned man, a restrained man.

Here, the programme did identify schüler correctly – as a noun and as the object of the preposition like. I was quite impressed with that, and what impressed me even more was the fact, that it also identified learned man and restrained man as appositional modifiers of schüler. This is the only example sentence in which not only the POS-tagging but also the indexing and dependency relation is correct. My only explanation for this is, that schüler is also a word used within the English language, though it is an Old-English word and not commonly used (see OED), and hence known to English dictionaries.

Lastly, I want to say that I actually had kind of fun doing this. Yes, I had to look up some of the linguistic definitions, especially with the dependency relations, but overall it was fun. And a bit infuriating at times when the programme made the same mistakes again and again. So I’m looking forward to the next step of the process.