While I understand why you'd think that, it actually is stressed at the right syllable. A native speaker should/would be able to understand the difference. The problem is that, because this is a question, the TTS recording has the voice go very high and, excessively IMO, emphasise the last syllable. But, overall, it is voiced correctly.
I hear the stress in the right place :) and I am not a native speaker.
It must have been a mistake. Sometimes, multiple-part hints are needed for a sentence in order for the translation to make sense. I guess there should have been a multiple-part hint here too, but was accidentally added only on πότε. It has been removed. ^.^