Well, they both are terribly bad at recognizing „foreign“ words (which basically means, any non-English words) in Multilingual structures where different languages are combined to create sentences. Honestly, I expected ANNIS to do a better job at annotating the multilingual words than Google Collab; however, they both annotated the words almost exactly the same. Having said that, I found it more pleasing and easier to work with ANNIS than Google Collab. Even though, downloading, installing as well as running the Desktop version of the program was torturous.
Moreover, thanks to ANNIS’s ability to search for specific words, POS, or token I noticed that when an English word is stuck between a compound of foreign words, it is recognized as a non-English word. The „in“ in this compound: Ministerio de Reubicación in Novilla was titled as a „true“ foreign word; though, the ADP is a correct POS tag for the word.
Similarly to the Google Collab, the common POS tags used for the non-English words were PROPN which is of course far from correct. One other observation was the affect of punctuations on the annotation of the words. Example of which is the different POS for the „, Llave Maestra.“ (Verb and Noun), which in other occasion of its appearance they are constantly titled as nouns.
The POS tags of the following sentence was one that caught my attention. First the X as POS tag for si; I wonder what X refers to here! (there are 11 other matches with POS=X). Second the false annotations of no and en as INTJ. Nonetheless, this sentence was one of the few examples in which not all alien words were tagged as PROPN.
This all show how much more improvement is needed for devices to be able to correctly recognize each word in multilingual sentences.
Dear Rey, interesting observations. I share your opinion that it was more fun (and maybe more intuitive) working with ANNIS. I'm not sure, though, if ANNIS can be praised for the right annotation of foreign words, since I think that was actually done manually... if not, we would probably have more mistakes (as we can maybe see in the example of the French noun "chose", for which ANNIS annotated the lemma "to chose"). As for the POS tagging: I think the tags are still the ones from spaCy. I looked up the POS tagging and found that the X stands for "other" - in my opinion that would actually be the correct tagging for all foreign words, since they don't work according to the English grammar. (I looked it up here: https://machinelearningknowledge.ai/tutorial-on-spacy-part-of-speech-pos-tagging/) Thank you for your inspiring post :)