It is read with ř, the voiced or voiceless alveolar raised trill (or flap) https://en.m.wikipedia.org/wiki/Dental,_alveolar_and_postalveolar_trills#Raised_alveolar_non-sonorant_trill https://en.wikipedia.org/wiki/Voiceless_alveolar_trill#Voiceless_alveolar_raised_non-sonorant_trill
In this position it should be the voiceless form.
The audio is quite correct here.
So it is sort of like like "rž" if you wish, but voiceless, the vocal cords do not vibrate. It is not like voiceless š (sh if you wish) because the tongue position is different (more forward).