SALDO Tagset and More SALDO Resources

[deactivated user]

    "SALDO (Swedish Associative Thesaurus version 2) is an extensive electronic lexicon resource for modern Swedish written language. It is created for the purpose of language technology research and for the development of language technology applications."

    Although SALDO is not meant to be used as a lexicon for human use, I have been using SALDO Search to get comprehensive inflections of Swedish nouns, verbs, adjectives, etc. in order to build my Anki Swedish card decks.

    However, I have been mystified by the inflection table indentifiers. For example, when I search on bära sig på the 'mönster' field shows this: vbm_4msp1_bära

    Dr. Lars Borin (director of Språkbanken (the Swedish Language Bank), and professor of natural language processing) at the University of Gothenburg (Göteborgs Universitet) kindly sent the link for the SALDO Tagset so I could sort out the identifiers:

    From the tagset descriptions (and from the description Dr. Borin sent), the parts of the designation for bära sig på above are:

    "vbm" -- 'multiword verb'
    "4" -- '4th conjugation', i.e., the strong and irregular verbs
    "m" -- 'does not form the past participle'
    "sp" -- 'reflexive pronoun + particle'
    "1" -- 'the word (verb) carrying the inflection is the first word in the multiword expression'
    "bära" -- 'inflects like this simple form'

    There are free downloadable resources for SALDO and an instruction manual which is, unfortunately (until I learn more Swedish!), in Swedish.

    There are also several SALDO Web Services I have yet to fully explore.

    I hope you find this information useful if you are going in this direction with your Swedish-learning.

    Written with StackEdit.

    February 11, 2017


    Learn Swedish in just 5 minutes a day. For free.