Sometimes they're really not spoken. We often connect the last syllable of a word with the first of the next word, and when the last syllable is not stressed, such as in "ela", the vowel might not even be pronounced.
So in this case it's very common to say it like "eluama" (the O sounding like an U because it's not stressed too). Using the accents to make it easier to understand it would be "éluãma", which is what the woman is saying in the audio.
I have first noticed this ^ not from DL's audio, but actually hearring my Br friends speak pt.
I asked them why they skip neighbouring vowels (like "você é menina" being pronounced as "você menina") and their answer was that that is just from practice, which they obviously got by being native pt speakers.
So in conclusion, the general way the audio "speaks" in DL is actually correct.