Parts of speech
Our corpus consists of 681 non-English and 3405 English words, meaning 4086 words in total. Here are some of the distributions as they were classified by the machine:
Part of speech | Total amount of part of speech | Amount of non-English words | Amount of English words |
Proper nouns | 989 | 420 | 569 |
Nouns | 606 | 168 | 438 |
Verbs | 408 | 25 | 383 |
Adjectives | 184 | 26 | 158 |
Adverbs | 105 | 10 | 95 |
Conjunctions | 79 | 1 | 78 |
Determiners | 258 | 3 | 255 |
The table shows the total amount of words belonging to part of speech category in the corpus and the amount of non-English and English words which belong to the respective part of speech. This shows that the machine usually classifies a word it does not know as a proper noun or maybe as a noun, but rarely as a conjunction, determiner, adjectives or adverbs.
English determiners, conjunctions, adjectives, verbs and nouns were usually categorized correctly, while non-English determiners, conjunctions, adverbs, adjectives, verbs, nouns and English adverbs were usually categorized incorrectly. This clearly shows that the English words, except fort he English adverbs, are usually classified correctly, while the non-English words, together with the English adverbs, are usually classified incorrectly.
Dependency relations
Regarding the dependency relations, a similar pattern is observable concerning to (in-)correct work by the machine: English words, except for the English adverbs have usually been assigned a correct dependency relation, while the non-English words, plus the English adverbs, have usually been assigned an incorrect dependency relation. It is also worth mentioning that some non-English words have not been assigned a dependency relation at all.
Dear Michelle, I really like that you have illustrated our corpus with a table so that you have a visual for your claims. I also found the distribution of parts of speech really interesting and it surely does underline the assumption made in class at the beginning of term, that ANNIS classifies most non-English words as proper nouns. I also think your finding about the non-assignation of dependency relations of some non-English words is really interesting. I did not have a closer look on that aspect and I am really fascinated, but not shocked, by it.