Ordering the Czech clitics: Introduction [2019-04-18]
This is our intro post in the series focusing on the words rather rigidly placed in the sentence, which we will call the clitics. Think of them as weak words (or arguably phrases) that like to cling for support to stronger pieces of the sentence, called the hosts, most typically by following them. This clinging has more to do with pronouncing the sentence in a stressed/de-stressed pattern than with the clitics actually being tied to the hosts by grammar.
First a few introductory bullets:
- Clitics are special words that cannot go just anywhere in the Czech sentence.
- They generally want to go right after the first constituent in the clause (in the second position).
- The first constituent is not necessarily just one word.
- Words that do not count as the first constituent: a, i, and ale.
- Constant clitics are always clitics. These cannot start a sentence, end a sentence outside the second position, or be the entire sentence.
- Inconstant clitics can go either way. They can be clitics, and then they have to follow the clitic ordering rules, or they can be non-clitics.
- Clitics like to bunch up in a cluster in the second position because so many of them compete for a single spot. Specific clitic order rules apply inside the cluster.
- Non-clitics are not allowed to interrupt the clitic cluster.
Let's take this step by step. In
- Já ho už nemiluju. (I don’t love him anymore.)
the clitic “ho” leans on the host “já“ for intonation support, even though “ho” actually serves the function of the direct object of the verb “nemiluju”. We will refer to this grammar dependency as the clitic being governed by that verb or another word. We could also say
- Už ho nemiluju.
where the hosting duty moves to the adverb, or
- Nemiluju ho už.
where the governing verb itself gets to be the host.
What all three Czech versions of “I don’t love him anymore.” given here have in common is that the clitic “ho” is placed in the second position. This will be one of the keys for dealing with the Czech clitics:
Clitics want to go second.
But what is the “first position”? In Czech, it is not necessarily just one word. For example, we could instead have
- Jeho krásná žena ho už nemiluje.
- Jeho krásná žena, pro kterou by modré z nebe snesl, ho už nemiluje.
The key there is that we need to be “done” with the first coherent piece of the sentence, even if it takes a subordinate clause to get there. So
- ???Jeho krásná ho žena už nemiluje.
is wrong because (“jeho krásná“ what?) we clearly were not done with that first piece. Identifying the end of the first complete constituent of the sentence should not feel entirely strange to an English speaker. “His beautiful does not love him wife anymore.” or “His does not love him anymore beautiful wife.” should feel ill-formed as well. The subject was not finished when we rushed in with the predicate.
Some words do not count as the first constituent.
Please keep in mind that a, ale, and i cannot host clitics and do not count for the first position:
- ???(A/Ale/I) ho miluje.
is wrong because the first position is effectively empty, and the clitic "ho" starts the sentence.
Constant clitics are always clitics, but inconstant clitics can behave as regular words.
Some words are always clitics, and we will call them constant clitics, in contrast to inconstant clitics. We need to know the difference because inconstant clitics can appear as regular words outside the clitic cluster, while constant clitics cannot do that. When an inconstant clitic shows up in the cluster, it must follow the cluster order rules/preferences as if it were a constant clitic.
If there are more clitics competing for the second place, they follow certain rules.
Clitics often meet in the same sentence and drift into a bunch we call the clitic cluster. We need some rules for ordering them in that cluster. As a starting point, our ordering formula will be
AUX < REFL < DAT < ACC < TO
- AUX = the conditional or past auxiliary form of “být“ [constant clitics]: bych, bys, by, bychom, byste; jsem, jsi, jsme, jste [Do not confuse the past auxiliary forms with the regular forms.]
- REFL = se, si, ses, sis [constant clitics]
- DAT = dative personal pronouns: mi, (mně), ti, mu, (jí), (nám), (vám), (jim) [inconstant clitics listed in (parentheses), the rest constant]
- ACC = accusative personal pronouns: (mě), (mne), tě, ho, (jej), (ji), (je), (nás), (vás) [inconstant clitics listed in (parentheses), the rest constant]
- TO = nominative or accusative independent pronoun “to“ [inconstant clitic, but typically shows up as a clitic]
- Učila jsem se. (I was studying.)
- Ale včera jste ji neviděl? (But yesterday you did not see her?)
- Já jsem ji neznal. (I did not know her.)
- František si ho pamatuje. (František remembers him.)
- Byla by mu zima. (He would be cold.)
- Líbila se mi. (I liked her.)
- Líbilo by se mi to. (I would like it.)
- Podařilo by se mi vás to naučit. (I would succeed in teaching it to you.)
A few examples illustrating the hard ordering rules by violating them:
- ???Dala ho mu. [No, the accusative clitic cannot go before the dative clitic. Either switch ”mu” to “jemu” or swap ”ho” and “mu”.]
- ???Líbila mu se. [No, the dative clitic cannot go before the reflexive.]
- ???Já je jsem jí vysvětlil. [The required sequence is “jsem jí je“: first the past auxiliary, then the clitic dative, then the clitic accusative.]
- ???Líbil se by jí. [No, the reflexive cannot go before the conditional auxiliary.]
- ???Chtěla jsem to jí říct. [As a clitic, the dative “jí“ must go before the clitic “to“. Here “jí“ is used as a clitic (because the strong pronoun “jemu“ would not work in its place).]
Non-clitics are not allowed to interrupt the clitic cluster.
- ???Chtěli jsme jemu ho dát.
The clitics “jsme“ and “ho“ cannot be separated by a non-clitic word, in this case “jemu“, but "Františkovi" or "té dívce" or "jeho třem dětem" would also create an unacceptable interruption.
Clitics cannot start a sentence, end a sentence after the second position, or become the entire sentence.
Only inconstant clitics can behave as non-clitics, so they can begin a sentence
- Jí jsem to nedal.
end a sentence beyond the second position
- Dal jsem to jim.
and be the entire sentence
- Komu jsem to řekl? Vám.
In all of these respects, inconstant clitic personal pronouns resemble the strong (never clitic) personal pronouns “tebe”, “jeho”, “tobě”, and “jemu”.
Examples violating these constraints:
- ???Tě neviděli.
- ???Dala jsem tu knihu mu.
Also note that
- Jsem doma. (I am at home.)
is OK. This "jsem" is not the past auxiliary.
Conclusion and next steps:
This post is just an introduction. Czech has more types of clitics than what we included in the initial formula, and even our course contains words of those additional types. We will need to extend our cluster formula for the more advanced students to include subject pronouns, genitive pronouns, instrumental pronouns, pronouns with prepositions, positive present forms of the regular (copula) verb být, and a few short adverbs.
We will also want to share some rules for the drifting of clitics into the cluster: Do we always have to dump all of the clitics in a centralized spot? And should we?
Finally, we need to work the clitic angle in the overall context of the Czech word order. What all happens outside the second position and why?
More to come.
Just an idea from studying German. Your students might find it interesting to see your examples laid out as a table like this (click to enlarge). Something similar exists for basic, acceptable German sentence structure.
Once that basic idea is mastered, word order can of course then be shifted around quite a lot just by moving stressed stuff towards the front or end.
Indeed, isn't the thing with "constant & inconstant clitics" about those pronouns (like mu/jemu, ti/tobě) that have unstressed/stressed forms, and those (like jí, nám) that don't?
Thanks for the suggestion! Yes, a table would show this more clearly. Maybe we'll add it later.
Your attempt is very good - apart from the second-to-last sentence, where you have missed the pronoun - remember, clitics like the second position, so you can't have "Jsem" as the very first word; the correct word order is "Já jsem jí to nedal./Nedal jsem jí to." (or similar).
Re constant & inconstant clitics, you got it right, at least as far as personal pronouns go. However, inconstant clitics are not limited to those personal pronouns only; eg. passive auxiliary verb forms are also inconstant clitics, as well as many (mostly) monosyllabic words - eg. "tu" ('here'), "už" ('already, yet') or "tady" ('here').
Quite right, thanks I overlooked that when putting the stressed form in under it. Corrected it above now,
The table just occurred to me as I have a series of these for German. They start with the very basic sentence framework, then gradually get more complex, adding in adverb positioning, modal particles, emphatic re-orderings, etc. I've found them pretty useful, and it would be interesting to see how such patterns could be displayed for Czech too, once, as you say, other clitics are brought in.
nice, how about a job? the pay sucks, but colleagues like this can't be found anywhere else, and maybe you'd finally get something for the volunteering (or babysitting or masochism or how to get people help you for free) section of your cv!
there could be one more challenge on the way to neat graphics based on pre-assigned 1D slots other than adding slots for a few more clitic types: the need to potentially show multiple clitic clusters for sentences with multiple clitic "governors" (that may themselves be moved around the sentence) and the climbing of the embedded clitics from their base positions (just before or just after their governors) up towards the main cluster.
why knowing how stuff got to the main cluster matters:
- Snažil jsem si jí vážit.
- Netroufal jsem se na ni usmát.
- Vysvětlit se mi jí to nepovedlo.
- Snažila se ho mě naučit jíst. ;-)
yes, "Snažil jsem se | si jí vážit" would be acceptable but requires a (prosodic) boundary as shown between se & si because they are in neighboring clusters. standard czech only allows one reflexive per cluster, even if it means the deletion of one of the potentially different reflexives.
would you also struggle with "Snažila se mě ho naučit jíst."? if not, my example may have worked as intended. if you struggle with it either way, we may learn a tiny bit about the unclear non-reflexive object control climbing constraint.
"standard czech only allows one reflexive per cluster"
oh, never noticed and never heard of this rule. thx.
"Snažila se mě ho naučit jíst."
now i see what you did there. she was trying to teach me how to eat some unusual food. i guess i'll stick to studying foreign languages :)
strangely enough, the more deeply embedded reflexive wins.
but this is only relevant if the embedded reflexive is allowed to climb to the higher cluster. there are good reasons to leave the embedded reflexive embedded. some deletions feel weird to many natives, and some sentences do not allow the reflexives to climb. also note that the "s" left over from the potentially deleted reflexive ses/sis must append to the climbing winner. this material would only be actively taught late in the tree.
- Snažil se zapamatovat si to. Snažil si to zapamatovat.
- Přál sis naučit se to. Přál ses to naučit.
Oh for sure, these kinds of tables are a toddler's 5-page cardboard, lift-the-flap book with velcro letters compared to the Encyclopedia Britannica of real language. But... they're a start.
And I wouldn't be surprised to find that they're the kind of basic structures that native brains lay down when first acquiring the language, with ever more templates of contexts, rules and exceptions later superimposed on top of them.
Oh, sorry, and a quick question you might enjoy. In a multi-verb sentence like your "Kateřina se jí mu to snažila pomoci vysvětlit", is it that the respective object pronouns order as their verbs do? This is what I intuitively took it to be:
pomoci - jí
vysvětlit - jemu
Someone or something keeps deleting this comment of yours. None of us, I am sure. At any rate, these same-case tie-breakers are actually done by the degree of original embedding, the top going first. So it would be the same sense of "helping her explain it to him" even if we shuffled it to something like
- Vysvětlit se jí mu to snažila pomoci Kateřina.
It is possible that the only people aware of that "rule" are PhD's in Czech linguistics and both of their fans.
Outside the same-case tie-break, the rules within a single cluster have little to do with the surface verb ordering either, as we can get by using the sequence of cases like that given in the OP.
I keep deleting it! But it won't die!! 😦 😂
I posted the question off-the-cuff, then inspected some examples and realized my observation wasn't at all true in general.
See my last post in reply to svrsheque for my cutting-edge beer-hall hypothesis on the structure of those sentences.
Apropos, though, this afternoon I visited the Kontext online corpus, searching for fodder to test. I tried checking through all possible arrangements of pronouns in the order [se][dat][dat] (e.g. se mi mu), but hit on precious little that shows that kind of clustering.
Do you personally come across more of these in the wild? Because I'm starting to think that, although they represent an inherent linguistic potential of the language, their occurrence is truly sporadic, and perhaps more clumsy than elegant. Especially considering the ubiquity of the standard form e.g. [se][dat][acc] clusters.
Of the few examples we've seen on this thread so far, more than one has been a head-scratcher, and there is a tendency in language to eschew those constructions, albeit grammatical and decipherable ones, that obstruct the natural flow of meaning and give rise to ambiguities.
structures involved in "Kateřina se jí mu to snažila pomoci vysvětlit." are included in the intended future scope of this course. consider them one of the tests blocking access to the final skill and whatever may be next to it. if we include matching exercises in both orientations and with several surface orderings, it may be hard even just to translate to english without thinking.
I have to say, the more pronouns that get rammed into that clitic cluster...., the more these things look (to me) like a purpose-built word puzzle. Wouldn't it be more normal to spread it out a bit - Kateřina se jí snažila pomoci mu to vysvětlit (or something like that?), presuming it is jí she's helping rather than mu? Four clitics followed by a logjam of verbs looks a little constructed, like those chess problems that you'd never see in a real game. And anyhow, isn't the richness & fluency of language about how it can expand and branch out endlessly, rather than the cryptic nuggets of dense sense it can get squashed into?
Or maybe my Czech's just rusty from disuse, and this seems plainer to those using it day-to-day. But I don't recall coming across such intense clumpings all that often, let alone without context in a standalone situation like Duolingo. If the rite of passage is going to be tougher than even natives routinely face, one can only hope that all the would-be warriors don't end up as lion fodder. 😋
So hmm, yes, I can certainly see the doctorate material in studying the quantum mechanics of how this all might abstractly go together, but do you think students really need to know much beyond that little table? Throw in the very common instrumental construction, a pinch of adverbs and a particle or two, and I feel like you've pretty much got the palette for 95% of usage. God knows, they're not going to be short of other grammar and vocab to learn. And that's talking about the native level, far beyond the B1 or whatever Duolingo offers its brightest graduates. On the German course I can tell you the major test and stumbling block seems to be Der, Die, Das. 😖
Don't get me wrong, I have the greatest admiration for the Czech course creators, and don't envy you your task. Students want rules, and your quest to deliver them has clearly led into some insanely complex territory. My speculations here are only about the limits of what can be taught within a Duolingo-type setup. As you say, natives are working with a neural network approach, and on a huge personal corpus of language acquired over decades. Students want, and probably do need, shortcuts to that skill-set. But what kind of shortcuts are worth having, and which ones simply distance the learner still further from the (mostly) effortless native language experience?
A different example: Debate continues in the German forum about determining noun gender. Every time the subject comes up, a kind soul produces a massive list of noun type and ending rules. Students cheer, "We've got the golden key!" But natives rapidly rain down on the parade with a thousand exceptions to the guidelines, and a mantra of "memorize, memorize, memorize!" Which is also kind of harsh when you consider that they never sat down with a dictionary and started working from A-Z themselves! My conclusion there was that, yes the rules are mostly weak and fuzzy, but whittling them down to a handful of the best would be better than none at all. And indeed, the shortlist would be precisely those kinds of rules that ordinary, reflective native speakers would be able to state themselves if pushed to do so.
So I guess my ponderings here are in the same vein. How much is enough, and in what shape or form should it be known? And certainly klobouk dolů to you for trying to figure that out through those PhD theses! Hope I haven't accidentally added to the pain 😀
isn't the richness & fluency of language about how it can expand and branch out endlessly, rather than the cryptic nuggets of dense sense it can get squashed into?
the constructed nature of the more extreme cluster examples provided by one imperfect volunteer does not mean that the innate ability and even preference to construct easier clusters instead of leaving the embeddeds scattered around is not included in the richness and fluency of language. we natives do these things unthinkingly because our natural neural networks had years of training by immersion. to what extent it can or even should be done by 'splainin' is a worthwhile discussion.
in this thread the main subject is the rules for ordering the clitic cluster. the overall word order in czech is probably what your tabular effort is more directly related to. whether or not the clitics stay embedded or all climb up to make people question their or my sanity or something between, my point remains that multiple verbs will challenge easy sentence recipes even more than they will a single cluster formula.
And indeed, the shortlist would be precisely those kinds of rules that ordinary, reflective native speakers would be able to state themselves if pushed to do so.
yep, i do believe this is exactly where the poodle's nucleus is located ;-)
it would appear to me that we native speakers have to repeatedly query our cranial black boxes to attempt the extraction of a workable mapping from grammar-based parameters to outputs like OK, acceptable, and garbage. unfortunately, an ordinary, reflective speaker will probably never figure out that they may want to structure their quest with deterministic concepts like (in)constant cliticity, fronting, desirability/possibility of clitic climbing based on what they are embedded in and what "lies above", and the various obstacles and interactions on the way. let alone imo to produce anything of value for the unfortunate student in the absence of that deterministic guidance.
yet it would also seem that the inability to articulate squat tends to not interfere with the effortless (if somewhat subjective at the margins) functioning of those native black boxes in the normal production mode.
i hope we tend to agree that
- Pomohli bychom se jim to naučit.
- Pomohli bychom jim to naučit se.
are wrong even without being able to explain why if life depended on it. (actually, if we as introspectively reflecting natives cannot agree even on this little, there may be nothing inside the fuzzy margins, and we may wish to abandon all hope of teaching anyone czech w.o. because we do not appear to know it ourselves. as upsetting as that might look to some, i would appreciate the refreshing clarity of that conclusion.)
if hope remains, maybe a workable and even fun path could lead through assembling a set of test sentences that would allow the discerning and willing natives to render judgment on their correctness or perceived degree of incorrectness, and maybe write the proposed immediate fix and the ideal replacement.
we may confirm that some (but not all) rules are so weakly felt that they are not worth teaching.
as for where the training exercises for the surviving rules might go in the tree, that may be much easier to figure out because 1. clitics permeate all the way down to the basic application of accusative pronouns or the first "se" verb, and 2. the impact of adding clitics and their governor structures needed lexically will require constant expansion and reinforcement of the rules.
To me it seems most likely that acquisition of this skill (both by Czech children and foreign learners) is through amassing an internal reference corpus of sentences/clauses, which can then be both generative and act as a template test of acceptable sentences. Obviously, this is only one template of several, which can be layered and/or embedded. But an important one nonetheless - the bottom layer??
Indeed, it's of interest that the specimens of more challenging clusters you've presented on this thread can also be analysed into 2 clauses using the same template. From this limited sample, the clitic cluster of the acceptable ones (1,2,3) reads as if fused together by a single join. However, the unacceptable one (4) requires you, reading left-right, to weave in and out of the clausal parts along a more convoluted path. A further test sentence (5) with a single join seems okay. Maybe that's not the jádro pudla 😉, but perhaps of interest for potentially teaching these structures.
- Kateřina se jí mu to snažila pomoci vysvětlit
- Snažila se ho mě naučit jíst
- Vysvětlit se mi jí to nepovedlo
- Pomohli bychom se jim to naučit
- Pomůžeme se vám postarat o děti
Your (5) would be like poison to a student eager to hear a straightforward recipe:
- Pomůžeme se vám postarat o děti.
Whether it is a well-formed sentence depends on the intended meaning. Both "pomůžeme" and "postarat se" can take a dative object, but only one of these is allowed to end up as shown here. The constraint eliminating "vám" as relating to "pomůžeme" is the same as the one making Pomohli se jim to naučit. (Let's not name it for now.)
How one manages to appreciate the difference between "pomůžeme vám" and "postarat se vám" probably depends on how we managed to learn this stuff. One could consider changing the verb from "pomůžeme" to "chceme", inserting an extra dative ("jí"), or switching the order of "se vám" to "vám se" (the latter containing adjacent trivial clusters), or comparing the meaning between the following embedded versions of different meanings:
- Pomůžeme vám postarat se o děti.
- Pomůžeme postarat se vám o děti.
The constraint eliminating "vám" as relating to "pomůžeme"
Are you saying that the theory declares:
"Pomůžeme postarat se vám o děti"
to be the correct interpretation of the sentence? It seems rather convoluted.
"Postaráme se vám o děti" is clear ("We'll take care of your children for you"). But "Pomůžeme.... postarat se vám o děti" ("We'll help to take care of your children for you") seems like a strange offer, as though you have these children, someone is going to take care of them for you, and then we'll help that someone to help you with the kids. 😊 Do you see what I mean?
Whatever the prescriptive grammar, I'm fairly sure the original intention was to say the far clearer "Pomůžeme vám postarat se o děti." ("We'll help you take care of your kids").
In true descriptivist style, the original sentence Pomůžeme se vám postarat o děti was gathered from the internet here. Do you not feel that most folks would indeed read that as "Pomůžeme vám"... ?
Isn't the plainer answer that, in everyday usage, people are simply sandwiching these constructions together willy-nilly from an internal corpus which applies quite a simplistic template? After all, it seems to me the reason why these clitic clusters exist at all is simply to contain the unstressed pronouns within a verbal bracket, preventing them from sticking out at the ends and thus being emphasized. Once that habit has been ingrained on simple sentences, it gets carried over into composite constructions with two distinct sets of verbal complements, and so the clitics have to huddle in a way that allows a linear sentence, but also allows the brain to decompose that into two separate clausal layers (as shown in my last table).
Thanks...I am mostly just saying that Pomůžeme se vám postarat o děti. strikes me as inadvisable for teaching. I actually believe it is ill-formed, but clearly not everyone does. The main issue with it is that to engrain the trivial REFL-DAT sequence in our students, we may want to avoid using examples that could prevent us later on from attempting to engrain that they should leave the REFL embedded if it is under object control.
- ???Pomohla se mu oholit.
I have a few hypotheses as to why many native speakers choose to violate that constraint for your (5) [even short of not feeling constrained at all], but splitting hairs there is not going to help our students learn anything too useful.
I continue to believe that structuring the teaching of Czech clitics along the prescriptivist trajectory is the only feasible approach. It does not mean we should not accept a wide range of side translations, but rather that we choose and stage the Czech source sentences and their main English translations with a destination in mind. Even if that destination is our take on a subset of Hana and Rosen’s work. I’d rather that our students slowly advance in a sensible direction, perhaps assisted by commentary in Tips & Notes, than that they be left to drown in the sea of undigestible corpus frustrations with just “that’s just the way it is” for explanations of why something works or not.
This potential difference in approach only shows up for clauses with nested verbs and could be used to advantage even then because we do want to teach on exercises someone would actually be likely to say (after an inconsequential noun or adjective tweak, to keep our material from becoming a conversation book wannabe).
Anyway, thanks, and if you want to impact the future of the Czech course from the inside, you will find our door open.
Many thanks, and for the offer. I do appreciate your aims are didactic, and my nuisance-making on your thread has kind of spun away from my original mere attempt to do a colorful join-the-dots picture for your students based on the examples from your own research. I apologize for that, although I still feel a visual approach with memorable paradigms is of value. Sometimes it's better to learn something incomplete but handy than to be given the whole picture at once in its terrifying truth and certainty!
Pomohla se mu oholit
I agree there's something irritating about pomoci followed by se in general. However, a Google search for even the restricted "Pomohla se mu", shows the (mis)construction to be alive and well among those unfortunates who haven't yet got the PhD memo.
As a last whimper from the descriptivists, having failed with Kontext, I note now from a focused Google search that in a state of nature the natives use the richer clusterings almost exclusively with everyday clichés like chtít se/podařit se/povést se komu.... A sample of prescriptively right or wrong usages shows these all to be straight clausal joins in the simplistic table scheme:
- Nechce se mi jim to říkat
- Vlastně se mi mu ji ani dát nechce.
- Nikdy se nám ho to nepodařilo odnaučit.
- Bohužel se nám mu je nepodařilo dokázat
This might offer an appealing avenue for teaching, as the (very useful) clichés almost cry out for this kind of continuation, and thus show why the locals hunger for a clitic sandwich in the first place.
And with that, I'll let you get on with your job in peace. Howgh! 😊
Thanks, I should have made it clearer that your visuals (or something like them) fit within the grand scheme of things just fine, regardless of how the path through the challenges is laid out. That grand scheme contains the other side of the coin, the word order freedom to express the endless nuance of what is primarily being communicated in what probable context.
On the "se mu" item. The somewhat irritating aspect of "se" that liberated itself from object control by climbing to the main cluster, such as most drastically in:
- Pomohla se mu zabít.
feels to be the loss of "self" association, especially if datives float around the main cluster. Whose existence was ended, and to whose "benefit"? Once the reflexives are up, they tend to want to point to the subject of the clause, and if the datives are repurposable into serving as loose adjuncts, the pointers get scrambled and the confusion is complete. At least some of the rules allow our brains to make sense of the discontinuities introduced by the clitic behaviors rather than just existing to amuse the PhD elites.
The last examples you chose share a key feature in that the "se" did not climb, let alone from under object control. The subject of the main reflexive verb was the deleted impersonal "to".
This is very helpful and greatly appreciated. How other languages, including Slavic ones, manage without a privileged Second Position, is fascinating, as is the question of why there is a pronunciation problem that needs this elaborate solution only in the second position but not elsewhere in a long sentence.