Annotating Multilingual Sentences in Yara Rodrigues Fowler’s „Stubborn Archivist“ – Experience & Observations

Initially, I was intrigued but also a bit worried about working on this project with Python, as previous linguistic research during my bachelor’s taught me that working with programming software can be a bit error-prone and frustrating at times. However, thanks to the prepared script, the annotation process via Google Colab and Python was very intuitive and easy to use, so thankfully, I did not experience any major technical difficulties.

For my research, I annotated multilingual sentences from Stubborn Archivist by Yara Rodrigues Fowler. The novel was published in 2019 and follows the coming-of-age journey of a young British-Brazilian woman in contemporary South London. While mainly written in English, it also includes many Portuguese terms and phrases, highlighting the character’s connection to both British and Brazilian cultures and identities.

In general, the software correctly annotated single Portuguese nouns like tia or empregada when they were used in a standard English sentence structure and clearly indicated, for instance, preceded by a determiner:

You don’t have an empregada at your house in London?

(Rodrigues Fowler 137)

However, some errors arose with compound nouns, such as in leite condensado (condensed milk), where the second part, condensado, was mistakenly identified as the head of the phrase instead of leite.

Vovó Cecília shook the bowl of cocoa powder over the pan and as the sprinkle powder became wet and fat and darkened the baby curved it into the centre of the hot leite condensado.

(Rodrigues Fowler 132)

Another recurring problem I encountered in the annotation involved Portuguese words that have English equivalents. For instance, in the passage:

But Vovô I thought Columbus discovered America? No. Christopher Columbus discovered América do Norte. América is a continent—two continents. And Brasil is in América do Sul.

(Rodrigues Fowler 146)

Here, the Portuguese preposition + determiner do was mistaken for the English (to) do and thus incorrectly annotated, once as a verb and once as an auxiliary.

On a stylistic level, the frequent use of dialogue inserts without proper punctuation stands out prominently in Rodrigues Fowler’s novel. This aspect also posed the most significant challenge during annotation, as illustrated by the following example:

So your father every time we went out he would want to try a new juice
And obviously there are so many juices and fruits that he had never seen before
Açaí
Maracujá
Acerola
Jabuticaba
Cajú
He was always asking—But what is that in English?

(Rodrigues Fowler 117)

In such instances, the program was unable to determine the correct sentence structure of the individual phrases, often interpreting them as one continuous sentence. Moreover, in this particular example, the individual Portuguese names for various fruits and juices were not recognized accurately and were instead marked as two compound nouns. However, this issue likely stems from the novel’s use of line breaks to separate the individual parts of speech, which cannot be indicated in the annotation program.

Dieser Beitrag wurde unter Allgemein, Blog posts, Student entries, Writing across Languages abgelegt und mit , , , , , verschlagwortet. Setze ein Lesezeichen auf den Permalink.

Eine Antwort zu Annotating Multilingual Sentences in Yara Rodrigues Fowler’s „Stubborn Archivist“ – Experience & Observations

  1. elisalue sagt:

    Thank you for your insights! I experienced the same problems with the classification of Twi words that have an English equivalent. Would you say that you were generally able to discern which words belong to the Portuguese or English language? In Micheal Donkor's "Hold" I would have struggled to correctly discern which language was being used, regarding words such as "me" etc., had it not been for the italics Donkor uses. Those difficulties mainly arose because I don't speak or understand any Twi. It seems to me that these overlaps between languages would challenge even a multilingual annotation dataset, seeing that in certain novels different languages are so interwoven. It would probably be difficult for the programme to know which language was being used, as in your example with "do".

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert