Our corpus

Our corpus

Parts of speech

Our corpus consists of 681 non-English and 3405 English words, meaning 4086 words in total. Here are some of the distributions as they were classified by the machine:

Part of speech	Total amount of part of speech	Amount of non-English words	Amount of English words
Proper nouns	989	420	569
Nouns	606	168	438
Verbs	408	25	383
Adjectives	184	26	158
Adverbs	105	10	95
Conjunctions	79	1	78
Determiners	258	3	255

The table shows the total amount of words belonging to part of speech category in the corpus and the amount of non-English and English words which belong to the respective part of speech. This shows that the machine usually classifies a word it does not know as a proper noun or maybe as a noun, but rarely as a conjunction, determiner, adjectives or adverbs.

English determiners, conjunctions, adjectives, verbs and nouns were usually categorized correctly, while non-English determiners, conjunctions, adverbs, adjectives, verbs, nouns and English adverbs were usually categorized incorrectly. This clearly shows that the English words, except fort he English adverbs, are usually classified correctly, while the non-English words, together with the English adverbs, are usually classified incorrectly.

Dependency relations

Regarding the dependency relations, a similar pattern is observable concerning to (in-)correct work by the machine: English words, except for the English adverbs have usually been assigned a correct dependency relation, while the non-English words, plus the English adverbs, have usually been assigned an incorrect dependency relation. It is also worth mentioning that some non-English words have not been assigned a dependency relation at all.

Eine Antwort zu Our corpus

anneschu sagt:

5. August 2024 um 14:06 Uhr

Dear Michelle, I really like that you have illustrated our corpus with a table so that you have a visual for your claims. I also found the distribution of parts of speech really interesting and it surely does underline the assumption made in class at the beginning of term, that ANNIS classifies most non-English words as proper nouns. I also think your finding about the non-assignation of dependency relations of some non-English words is really interesting. I did not have a closer look on that aspect and I am really fascinated, but not shocked, by it.

Antworten

Eine Antwort zu Our corpus

Schreibe einen Kommentar Antwort abbrechen

Archive

Kategorien