When the narrator speaks quickly, he slurs his vowels. I listened at least 15 times, and it sounds like he says: "Kiaŭ estas du kaj unu?"
The only reason I was finally able to determine that he must've been saying "Kiom" rather than "Kiaŭ" was from the context of the sentence, and because I hadn't heard such a word as "kiaŭ" during all the lessons I've studied so far.
I just played this for my family to get their take. (It's sometimes difficult to test it myself because often what we hear depends on what we're expecting). Every one of them heard the O in the first word. There was some disagreement as to whether it was "kio" or "kiom". To me it sounded like "kiom", and my wife (who wasn't allowed to see the text) said she could hear it either way depending on what she set her mind to expect.
Someone voted your comment down, but I voted it back up. It's a reasonable question and part of learning. You experience the same thing all the time in your native language but you don't notice it because you are so much better at it. It takes time to develop your ear in a new language. The speaker here is a good model for how people really speak Esperanto. If you don't understand something, consider it an opportunity to train your ear.
If you do maths you say: How much is …?
If you count things you say: How many are …?