Working with our corpus and ANNIS was an interesting experience because it gave us the first overview of how our corpus looks and how we can use it to make deductions about our non-English novels as a whole. Playing around with different prompts showed a lot of interesting results but also conformed our findings from the Google Collab analysis that machine annotation makes a lot of mistakes with non-English words. On the other hand, we could gather much information about how non-English words are incorporated into our novels in terms of parts of speech.
When looking at the annotation of our corpus with ANNIS, there were a few things we found striking. One of the main phenomena we discussed is the number of nouns in the corpus that are classified as “foreign”. On the other hand, many of the “foreign” words are, in fact, classified as nouns, meaning that, if we leave out any mistakes ANNIS might have made, the way non-English words are included in the novels in our corpus is often in the form of nouns.
We discussed a few different reasons for this. For someone who does not speak the language, reading an English sentence with non-English instead of an English noun enables the author to keep an English sentence structure, making it easily comprehensible than a sentence that incorporates, for example, a foreign verb or adjective that might change the grammar of the sentence and confuse readers that are not familiar with sentence structures in other languages. We also discussed that the novels we added to the corpus are mostly popular books. The might play into the number of foreign nouns used in comparison to other parts of speech in the way that they want to reach a broader audience and cannot rely on multilingual readers that understand the non-English language included in the text. For a broader audience that might first come into contact with the specific language, more simple words like food items, clothes, activities, places are addressed than in books for a multilingual target audience from the culture that is included in the story.
Dear Jenni, I find it very interesting what you say about how comparatively easy it is to insert foreign nouns into an English text without it requiring too many changes and making the sentence too complex. Especially in terms of comprehension, I think it's also easier to infer the meaning of nouns from context, for example when they are part of a list or a string of mainly English words. Also, when looking back to my own reading experience nouns were the easiest to understand especially if they were repeated in similar contexts because with every repition the mental picture of what this thing could be becomes clearer. And in my opinion that is also true for the examples you gave, like words for food items or cloting and places!