Introduction
Having focused on reading literature through a postcolonial studies lens as well as on Eurocentric bias in the field of linguistics throughout my studies, the attempt of using conversion and annotation tools on a postcolonial post-monolingual Anglophone novel seemed intriguing. I was interested to see how it would deal with the novel I chose – Susan Abulhawa’s The Blue Between Sky and Water.
The novel by the Palestinian-American writer and human rights activist mixes Palestinian Arabic with English in a few different ways, although some patterns can be observed: Food items, terms for relatives, and culture-specific terms are usually written in latinized Arabic. Terms are usually introduced in italics once, and then re-appear throughout the novel un-italicised. As the software works with raw text, these specificities were lost. Apart from nouns, however, the novel includes verbs, adjectives, and phrases here and there in Palestinian Arabic as well, sometimes translated in the next sentences or before that, and sometimes not.
Sentence Choice
I chose seven sentences that show variation in the mixing of languages. For instance, one sentence I picked only includes Arabic nouns which denote food items:
“One of her brothers arrived and they all shared a late breakfast of eggs, potatoes, za’atar, olive oil, olives, hummus, fuul, pickled vegetables, and warm fresh bread” (174).
Another contains only one adjective in Arabic:
“‘Who is there?’ a woman’s voice asked in Arabic and Nazmiyeh relaxed upon hearing the Palestinian fallahi accent” (35).
Yet another contains an entire phrase:
“The woman with wilted breasts began to sob quietly as others consoled her and banished the devil with disapproving eyes at Nazmiyeh – a’ootho billah min al shaytan – when a female soldier wheeled in a large box of clothes, and with a gesture of her hand, gave the naked women permission to get dressed” (114).
In addition I chose other sentences which include nouns in different ways to compare how the software deals with them. This was useful, as will become clear in the analysis of the mistakes the software made in the annotation.
Problems and Technical Difficulties
In general, working with the Jupyter Notebook style interface via Google Colab and Python worked out without any greater issues. The only problem I noticed was that double quotation marks could not be used, as they are part of the code and thus confuse the interface. Single quotation marks were acceptable for the software, so I replaced double quotation marks with single ones.
Google Colab, however, was not as user-friendly. I found it rather unintuitive and it often claimed I had too many tabs open at the same time. As other students recommended, clearing the history and waiting a few minutes seem to have solved that issue. It is, however, time-consuming.
Mistakes in the Annotation
Most of the time, Arabic words were labelled as PROPN, proper nouns. This was especially the case with the sentence including an entire phrase in Arabic. For instance, “a’ootho” should be labelled as a verb, but was labelled as proper noun. “billah” was labelled as one single proper noun, even though it should be labelled as a preposition in combination with a noun, or proper noun (Allah). In other sentences, the software labelled Arabic nouns as adverbs. One example is “fuul”, which is a type of bean. Yet other nouns, such as “jomaa” were incorrectly labelled as adjectives. So, while the most frequent mistake was to label any Arabic word as proper noun, the software was not consistent in its mislabeling throughout.
The software was still able to locate the ROOT correctly, despite its confusion around the Arabic terms. Arabic words resembling English words were sometimes thought to be English words. For instance, “Um” (mother) was mistakenly labelled as an interjection.
Overall, this was an interesting experience, but I was disappointed that the software was unable to deal with Arabic words to such an extent, even though it was expected.
I made the same observation in the novel I read (America Is Not The Heart) that most non-English words were food items and other culture specific terms . However, in my case it was never written in italics. It seems to be a very deliberate choice by the authors whether they write the foreign words/phrases in italics or not. They are more highlighted that way, but maybe that's not necessary. In my novel, the non-Engish words stood out to me even without italics. Also the fact that the Palestinian Arabic words are written in latinized Arabic is an interesting choice - it was probably easier with the formatting (as Arabic script goes from right-to-left) and it is easier for non-Arabic speakers to read. (On the other hand - Maybe if certain sentences were written in Arabic script instead it would have had a unique effect on the reader) I agree that using Google Collab was not very efficient if you wanted to work on multiple sentences at once. If someone wanted to work on a greater scale, they would need to find a way to make it less time consuming. I wonder what made the program think that some words are adverbs or adjectives. For my sentences, it labelled most foreign words as proper nouns. Maybe the program tried to find patterns where it expects an adjective or adverb? It also wrongly labelled a sentence with "may" as an auxiliary because it assumed it was the English auxiliary verb instead of a foreign word. So perhaps why it labelled "Um" as an interjection? That would mean that foreign words that have an identical spelling to an already existing English word are more likely to get mistakenly labelled. Which would be a major weakness when dealing with these types of sentences.