It is the T-V distinction https://en.wikipedia.org/wiki/T%E2%80%93V_distinction so jste is on the V (vy) side and jsi is on the T (ty) side. It is not about politeness, it is about the formality level.
Do you mean the audio? The Text-to-speech software messes up intonation.
If it's a "write what you hear" exercise, it doesn't matter whether you put a question mark at the end or not, Duo accepts it either way - so you can just enter "Jste vysoký Františku" regardless if it's a question or not.
In other exercises you can tell whether it's a question by the presence of absence of the question mark.