Set up and first impressions
I found working with ANNIS both fun and insightful. I especially appreciated that query results displayed annotations and sentence dependencies immediately, unlike in Google Collab, where multiple intermediate steps were needed. This intuitive interface made ANNIS very user-friendly in my experience. After overcoming some initial difficulties with setting up combined queries, it was actually quite fun to experiment with different search requests and observe the results.
POS-tagging in ANNIS
As we have already established using Google Collab, most foreign-language words are recognized as proper nouns by machine annotation. The analysis with ANNIS confirms this impression: Out of 681 entries tagged as „Foreign“ in our corpus, 420 are labelled with the POS tag PROPN. I was therefore particularly interested in the instances where other POS tags were assigned.
The category NOUN forms the second largest group in the corpus, with 168 entries in total. In many of these cases, the annotation process appears to identify misspellings of English words, as evidenced by the example „climinals,“ correctly identified as a noun despite being categorized as foreign:
Additionally, I noticed that this category often includes words that are established foreign words in English, such as „malum,“ „ennui,“ and „hummus,“ all of which are listed in the Oxford Dictionary.
The POS tag for adjectives is less common, appearing in only 26 entries. These entries often consist of foreign words introduced by an English determiner or occurring immediately before a word tagged as a noun. Interestingly, in one example, the English word „cadaverous“ was mistakenly labeled as foreign but correctly identified as an adjective at the same time:
The POS category VERB is also relatively rare, appearing in just 25 entries in our corpus. This tag assignment seems to primarily follow the V2 sentence structure of English, as the second word in multilingual sentences is most frequently assigned this category.
Overall, some of these observations are rather anecdotal, as many entries do not seem to follow a clear pattern in the machine annotation. It would be interesting to see what results a larger and possibly more refined corpus might yield in these cases.
I agree with you! Experimenting with different prompts to analyse different aspects of the corpus was rather fun. I wonder if it would be feasible to introduce the complementary tag "misspelled" (or something along these lines) to signal and analyse the use of this particular stylistic feature further? Right now it would be a manual task to look through the spelling of each word. In a larger corpus it would be helpful to see the different spellings via one single prompt: there might be patterns in the use of certain spellings etc. that could offer insight into this feature of multilingual writing. All in all, in my analysis I struggled to find a coherent pattern in the annotation of "Foreign" words as well, there always seemed to be exceptions.