Irish word frequency list?
Whatever your thoughts on the usefulness of learning off the 100/200/500 most common words in a language, many people find word frequency lists helpful when building a core vocabulary.
My question is: is there a word-frequency list for Irish that is built on a fairly solid corpus of contemporary Irish?
MrCliffJones provided a translation of a Swadesh list here
Kevin P. Scannell put together this one which he says was based on "a large corpus of Irish texts".
If I remember correctly, the vocab in Buntús Cainte was based on a study of the essential vocabulary of the time, but it is now a bit dated.
The website potafocal.com has a "House Glossary" with 6,000 terms "covering the basic vocabulary of the language." That's much more than you were looking for, but it might still be interesting to take a look at. The creator of that site might also have a database that could produce a list exactly the length you want.
Yes, theres a link at the bottom where you can download the word list and also another list of example sentences. I use pota focal/beo as my dictionary on the learning with texts site.
Looking at a randomly selected entry there, ó, its page includes a Statistics section that notes that it’s the 14th most frequently used word in Irish. It isn’t clear whether that includes all of the words spelled ó (e.g. the preposition “from”, the noun “grandson”, etc.), and whether the frequency represents a lemma rather than a word (i.e. are uaim, uait, óí, uí, etc. all counted as forms of ó for determining its frequency ranking).
this is his list, you can download and cut it.
As for lemmas and the like- it groups all meanings of the word it has under the single heading. so súil for example http://www.potafocal.com/gt/?s=s%C3%BAil has eye, and hope sentences as examples. on the site it shows you different forms of the verbs, but not in the list.
he also has sentence lists and other stuff http://www.lexiconista.com/datasets/
what you could do is add the word frequency list to "learning with texts" and set the beo glossary as your default dictionary. then if youre learning by word frequency list as you said, everytime you click on a word you dont know it would give you example sentences from beo.
I like Scannell's. I've interacted with him a few times personally, and I know he's very interested in using software to increase Irish ability and such. The one thing with his is that it's a simple corpus search for matches. It doesn't seem to group by lemma, so cheist and ceist are different, as are sa and i.
The first one is absolutely awful. Avoid it. It translates tá as 'have', and a as 'his' (despite the fact that the most frequent a is actually the relative clause particle)
Thanks. His seemed the best at first glance. I'll start loading it into Anki.
I've spotted several more mistakes in the 1000mostcommonwords one. The Irish for crease is apparently 'crease'. God, Béarlachas is getting awful bad these days. (It's fithín, roc, or filltín in case anyone's wondering)
Check out the lexiconista.com link that DavidColli4 provided below — as long as you’re OK with a list of lemmas rather than a list of words (e.g. bí in that list subsumes tá, atá, bhfuil, etc.; an in that list represents the definite article, the interrogative particle, etc., rather than separate entries for each; ag subsumes agam, agamsa, etc.), it contains exactly what you’re seeking.
EDIT: The personal numbers have separate entries from the non-personal numbers in the list, but it looks like triúr and dáréag aren’t found frequently enough to be in the list. Proper names are in the list; Mór is present, and mór isn’t. Not all month names and day names are in the list. Individual capital letters are in the list; I (English pronoun? Roman numeral as ordinal?) is present, and i isn’t.