1. After importing german documents into WKS, I found that the tokenizer breaks words in unexpected ways. Examples: Composite words are split into single words (“Dampfschiff” –> “Dampf” and “Schiff”).
2. In the german text corpus I am using, several times the combination of words makes more sense than single words. Example: “green tea” as a treatment against arthrosis. However, the mentions in the text have different grammatical cases or vary between singular/plural:
– “Grüner Tee”
– “dem grünen Tee”
– “der grüne Tee”
My question is: how does WKS cope with annotations of groups of words, especially if in the text there are slight differences in writing?