Irish word frequency list?
Whatever your thoughts on the usefulness of learning off the 100/200/500 most common words in a language, many people find word frequency lists helpful when building a core vocabulary.
My question is: is there a word-frequency list for Irish that is built on a fairly solid corpus of contemporary Irish?
MrCliffJones provided a translation of a Swadesh list here
Kevin P. Scannell put together this one which he says was based on "a large corpus of Irish texts".
If I remember correctly, the vocab in Buntús Cainte was based on a study of the essential vocabulary of the time, but it is now a bit dated.
I like Scannell's. I've interacted with him a few times personally, and I know he's very interested in using software to increase Irish ability and such. The one thing with his is that it's a simple corpus search for matches. It doesn't seem to group by lemma, so cheist and ceist are different, as are sa and i.
The first one is absolutely awful. Avoid it. It translates tá as 'have', and a as 'his' (despite the fact that the most frequent a is actually the relative clause particle)
Thanks. His seemed the best at first glance. I'll start loading it into Anki.
I've spotted several more mistakes in the 1000mostcommonwords one. The Irish for crease is apparently 'crease'. God, Béarlachas is getting awful bad these days. (It's fithín, roc, or filltín in case anyone's wondering)
The website potafocal.com has a "House Glossary" with 6,000 terms "covering the basic vocabulary of the language." That's much more than you were looking for, but it might still be interesting to take a look at. The creator of that site might also have a database that could produce a list exactly the length you want.
Looking at a randomly selected entry there, ó, its page includes a Statistics section that notes that it’s the 14th most frequently used word in Irish. It isn’t clear whether that includes all of the words spelled ó (e.g. the preposition “from”, the noun “grandson”, etc.), and whether the frequency represents a lemma rather than a word (i.e. are uaim, uait, óí, uí, etc. all counted as forms of ó for determining its frequency ranking).