If you're talking about its pronunciation: - Ki + little yo = Kyo - u after o = long o So the city (and prefecture) name is pronounced as "Kyooto" ("oo" as a long "o"). And that is what the audio says.
If you are asking why it is written as Kyoto instead of Kyouto or Kyooto, it's because "Kyoto" is how it is written in English, and most western languages don't deal with long vowels (also because English speakers would read "Kyooto" or "Kyouto" in a different way than it is read in Japanese). I hope that helped
You probably already figured it out, but... Hiragana/katakana alphabet has 46 basic letters, plus there are also 62 modified forms to describe more sounds and 6 additional letters, each represents particular syllable. ん is only one, where consonant is not combined with a vowel. Kya, kyu, kyo, kye and other sounds like this are represented as combination of the consonant of i-syllables combined with small ya, yu or yo: きゃ、きゅ、きょ、きぇ、じゃ、にゃ, where they don't stand as separate sounds, they are both part of modified form ki+yo=kyo. If it would be written with normal size "ya, yu, yo", like きや、きゆ、きよ, then it would be read as two separate sounds like kiyo, kiyu, kiyo and so on. As about "ou", in English and many other languages stressed syllable usually is pronounced louder, but in Japanese stressed syllable usually is pronounced longer. This prolongation in hiragana sometimes is written with additional vowel like: おう = o-o = [O:], えい = e-e, and sometimes with "ー". For example せんせい = sensé = sense-e, センター = sentá = senta-a.