A dictionary tool I made for reading foreign texts
I thought folks here might find this useful: https://github.com/zvd2/wortserchilo
It's not the greatest, but basically it lets you get a word definition in any language just hovering your mouse over the word and pressing a hotkey. Unfortunately I haven't been able to compile it for Windows myself, so I guess for most folks this is just a bit of a tease. If anyone on Windows knows how to compile a program, I'd be very grateful if you would make a Windows release.
Works best with Latin languages. It has a few problems with Asian languages but can still be very useful.
- Looks good in the demonstration .gif file.
- I am looking at following and currently actively investigating a similar approach using Wiktionary.
- So what is in short what you do?
- Like the following?:
- 1) You take the word you hover over or click on
- 2) and goto just to the Wiktionary page for that word?
- 3) Or do you parse the Wiktionary page also? (given that word you click on). If so that parsing action is where I am most interesting in (your approach to do so in that case or where the information you found to do this parsing (format, ...)).
It has wikitext definitions in a file difinoj.zip, which is generated from the Wiktionary XML dump by a companion program, https://github.com/zvd2/vikivortaro-konvertanto. Unfortunately the wiktionary template syntax isn't very consistent between languages, and sort of messy — although I suppose that's necessary to cover all languages — so it's specific to the English wiktionary.
Upon pressing alt + x, the program takes a screenshot of the area around the cursor and invokes Tesseract, an OCR library, to read the image. Subsequently it runs the result through a spelling corrector (SymSpell) and gives a definition for each spelling suggestion.
The underlined words are clickable and will cause the program to try to give the definition of the word. It also does that when you click a glossary link, which of course leads to "no definitions were found" — another misfeature.
Some further clarification on the converter: Most of the interesting data is in AnglaKonvertanto and datenojDeAnglaKonvertanto.cpp, the latter contains a list of tags that should precede a definition. It gets most of the definitions on the English Wiktionary. It's not exactly flawless and will for many languages miss some definitions or get some extra data we don't want, like pronunciation and such.
It has a few problems with Asian languages
Like which? (please clarify what you intend to do and where it then does not work as intended).
I've only tried out Asian languages a few times so far and the imagereader has a lot more trouble with Asian languages. The complex forms of Asian characters require larger fonts to be clear in the black and white image, and even at enlarged I've seen it have trouble with some Chinese characters. With Japanese the imagereader often doesn't correctly recognize word boundaries, as those languages don't use spacing to separate words — although it still usually gets the characters right, which in itself can be very useful.
OK, I see the issue is thus in the OCR, not really used by me in this context, so not really advising possible. Thanks.