Annotating multilingual sentences in „America Is Not The Heart“ by Elaine Castillo

Working with Google Collab to do the sentence annotations wasn’t so scary to me because I was already familiar with some coding/programming and linguistics from my bachelor. However, I haven’t worked with both in combination like this before, so that was a new and interesting experience for me. I had no real problems working with the program, some difficulties only occured when trying to run multiple collab notebooks at the same time (I had to save and close some before moving on to new ones!).

I chose a few sentences from the novel „America Is Not The Heart“ by Elaine Castillo, some sentences are mostly English with a few Tagalog loanwords (which are all food-related items), some Tagalog sentences with some English words and some completely Tagalog sentences. From my first observations I could tell that the program had no idea how to deal with the Tagalog words or sentences. On a positive note, the program at least annotated the English parts correctly from what I could see.

As an example, the sentence „Ang ganda naman ang bahay ninyo, Tita.“ has every word categorized as a Proper Noun, which in turn completely messed up the dependency relations of the sentence. Even someone who doesn’t speak Tagalog should see that that’s probably not correct. As someone who knows some Tagalog (asks my mom what the sentences mean), I can confidently say that the program just labelled every unknown word as a proper noun and created a very random dependency relation.

As a visualization, I copied the sentence into an excel table with the annotations and dependency relations that the program provided. The yellow highlighted bit is my attempt at giving a slightly more correct version of the word categories with a rough translation. I did not attempt my own dependeny relations for lack of grammatical knowledge, but I would guess that it should look very different than the current one.

Either way i found it very interesting with how the program dealt with the sentences!

Eine Antwort zu Annotating multilingual sentences in „America Is Not The Heart“ by Elaine Castillo

Anna Prickarz sagt:

19. Juni 2024 um 12:22 Uhr

Dear Charmaine, Reading your blog entry, I realized I forgot to comment on the machine’s ability to annotate the English parts in my blog post and I agree – the annotations and dependency relations for English seemed plausible to me as well. I find quite interesting that for Tagalog, the machine categorized everything as proper noun, because this was not the case for all languages. For the Palestinian Arabic examples from the novel I chose, Susan Abulhawa’s “The Blue Between Sky and Water”, some words were labelled as proper nouns, but others as adjectives and adverbs – in all instances it was a miscategorization. As for the randomness of dependency relations, we seem to have had a similar experience: non-English words and phrases confused the machine and led to nonsensical interpretations of dependencies. Last but not least, I wish to point out that the idea to visualize the results via Excel is great! I had wanted to include some screenshots of dependencies, but since Google Colab makes it difficult to see an entire sentence at once, I refrained from it. Visualization via Excel clearly solves that issue.

Antworten

Annotating multilingual sentences in „America Is Not The Heart“ by Elaine Castillo

Eine Antwort zu Annotating multilingual sentences in „America Is Not The Heart“ by Elaine Castillo

Schreibe einen Kommentar Antworten abbrechen

Archive

Kategorien