ANNIS
Working with ANNIS was easier for me as compared to the initial experience of working with spaCy, which I assume is because we had the experience of working with spaCy and sort of knew what each term meant. Initially, I had trouble getting ANNIS to work on my system (which I later understood was a mistake on my side), but once it was set up, I had a fun time exploring it. It was especially insightful as we had a bigger corpus with sentences from a lot of other novels and not just the one novel I was working on. Comparing sentences from these different novels helped me understand recurring patterns in POS and dependency relations tagging, albeit there are discrepancies in the pattern across the corpus.
While looking at the sentences, as with spaCy, most non-English words were POS tagged as proper nouns. A lot of these words also began with a capital letter in the example sentences, so they might have been tagged as proper noun due to the English language parameter that a proper noun starts with a capital letter. Most of the POS tagging took after the English language sentence structure, even though the sentences were not entirely in English. I can’t help but think that the machine here ignored the presence of the non-English words and imposed the predominant English structure to these post-monolingual sentences. We can also say that nouns are the most common words in English, and they are key elements when it comes to a sentence legibility, due to which, most words were automatically assumed to be nouns as well. Of course, there were non-English words that did resist the typical English sentence structures and tagging, such as some swear words, and the names of certain foods, drinks, and clothing.
Translations
Translating our example sentences into other languages was yet another insightful activity. I chose example sentences from the novel The Ministry of Utmost Happiness. Most sentences that carry a non-English words (Hindi or Urdu in this case) are followed by a close translation in the novel, but I tried to translate the sentences anyway, to languages such as German, Italian, and Spanish, in addition to Malayalam, my mother tongue (for which I used Google Translation and self-translation). It helped a great deal that I can speak and understand Hindi to an extent. With machine translations (I mainly used Google Translate and DeepL), if the source language was chosen to be English or auto-detected to be English, the non-English words were retained in the translated sentences as well. When I selected the source language to be Hindi, some of the translations were comparatively better. This was not always the case, and shows the machine is still figuring out how to work with two languages in one instance, which might be challenging in the post-monolingual era. I had a similar experience when I used machine translation to translate the sentences to Malayalam, the machine retained the Hindi words, while translating the English words.
I also tried to translated the individual Hindi words to English and the other European languages using Google Translate, some were accurate, but some of these words were very culturally and contextually specific, so they were defined with longer sentences that tried to translate the cultural context as well. For e.g., the words „khichdi“. Khichdi is an Indian dish made of lentils and rice. This was translated to „A dish in South Asian cuisine made of rice and lentils“. Contextually, the word khichdi in the novel conveys a confusion, a mix of everything. But one would miss this interpretation if one only depended on the translation. What can be concluded from this is that a lot is lost in translation, especially in machine translation.
I relate to your experiences with ANNIS and machine translation as well. Like you, I found ANNIS much easier to use than spaCy, mostly because of its user-friendly interface. It's interesting how we both noticed non-English words getting tagged as proper nouns a lot, showing how the software’s in place impose English structures on other languages. Machine translations often miss cultural nuances, leading to literal and sometimes incorrect results. Your example of "khichdi" from The Ministry of Utmost Happiness is like my experience with Ocean Vuong’s text, where the meaning got lost. Manual translations definitely do a better job at keeping the original text's feel and depth. Your observations highlight the challenges of translating multilingual texts and show that while machines can help, they can't replace the nuanced understanding a human translator provides and definitely reflect a scope for further improvement in research and translation softwares.