My experience with annotating multilingual sentences with Google Collab

My first experience with Google Colab was surprisingly positive. While the idea of working with a programming software seemed overwhelming at first, when we actually got to try using it during the seminar, it seemed quite intuitive and easy to navigate with the help of Ms. Pardey and the other students. The concept of syntactic trees and dependency relations was something I initially struggled with, considering my Introduction to Linguistics lecture was in 2017, and I have since mostly stayed in the field of Literary and Cultural Studies. Combined with the abbreviations of the different tags within the programme, it was difficult for me to understand the results Google Colab was showing me. However, the class discussion and the glossary, as well as some revision using Google were very helpful. What I also did was feed the programme simple sentences at first (think “I like apples.” “What is your favourite colour?”) to see what the results would look like with less complicated sentence structures.

When I tried to annotate my own sentences from the novel “On Earth we’re briefly gorgeous”, I came across some difficulties, as I cannot understand Vietnamese and thus had to find out the structure of the Vietnamese passages myself first in order to verify Google Colab’s results. What I discovered was that the programme struggled to determine the Vietnamese words, which, to me, seems to be inevitable because, as I understand it, the programming language is only suitable for the English language(?). Because of this, the overall dependency relations were off, since my example sentences combined English and Vietnamese words in the same sentence but mostly without an indicator like an English preposition or determiner. I could not yet find a way to fix this problem but am very interested in what would be a solution in such cases.

I am interested to learn about the others’ experiences with Google Colab and am keen to learn more about computer-based analysis of multilingual text.

Dieser Beitrag wurde unter Allgemein, Blog posts, Student entries, Writing across Languages abgelegt und mit , , , , verschlagwortet. Setze ein Lesezeichen auf den Permalink.

Eine Antwort zu My experience with annotating multilingual sentences with Google Collab

  1. judiths sagt:

    Dear Jenni, I think our experience working with Google Collab were similar in many aspects. When reading your blog entry, I remembered that I too had some issues with understanding the syntactic trees and I too found, that the results Google Colab provided must be off even though I don’t speak the language, just because sometimes it showed me that these sentences did not have a verb, which did not make much sense in the context. So, that seems to be a common issue with Google Colab. And I agree, it must have something to do with the fact that we only fed it with a model for the English language and no other one, so it can only fall back on the data it has for the English language which. I guess, that includes data about which words serve which function in a sentence and what parts of speech they are exactly, then it simply doesn't have enough information to annotate these non-English words and phrases correctly. At least, this is how I understood it. ?

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert