Impact of annotating groups of words in Watson Knowledge Studio

I wonder whether it is a problem in Watson Knowledge Studio to annotate multiple single words. There are two reasons I am asking:

1. After importing german documents into WKS, I found that the tokenizer breaks words in unexpected ways. Examples: Composite words are split into single words (“Dampfschiff” –> “Dampf” and “Schiff”).

2. In the german text corpus I am using, several times the combination of words makes more sense than single words. Example: “green tea” as a treatment against arthrosis. However, the mentions in the text have different grammatical cases or vary between singular/plural:
– “Grüner Tee”
– “dem grünen Tee”
– “der grüne Tee”

My question is: how does WKS cope with annotations of groups of words, especially if in the text there are slight differences in writing?


Leave a Reply