It could but it is a different "se". Sometimes when you have "with", which is normally "s" in czech in front of another hissing sound, like "z", you put in "se" instead in order not to spit on everybody around. It is just easier to pronounce. But the "se" that is part of reflexive verbs is just that. A part of reflexive verbs.
The verb here (dívat se) means "to look at" and "to watch", rather than "to see." For "to see," the verb would be "vidět."
But I'm a learner, too, so you may get a better explanation from one of the DL native speakers. So many Czech words have so many different meanings, that what I know so far may not be enough!
I have checked the audio, and I can clearly hear to on both the normal and slow speeds, but in any case, the course team can't "fix" a problem like this. Sometimes things sound different from one device to another, though, so you may have a problem that others do not. Hint: You can try putting each word into the sentence, to see if one makes more sense than the other; sometimes that can help.
I'm native AmE, so you might get a better answer from one of the Czech natives on the team. But I've been at this for almost two years now, and from what I've seen, your answer to your question sounds right to me. But it's entirely possible that there are nuances I haven't come across yet...