That is what I thought or assumed too, and the same for "vi snakkes" and some other similar phrases. But it turns out these are not the passive forms of å se and å snakke. They are their own s-verbs. For example, å ses / å sees: https://ordbok.uib.no/perl/ordbok.cgi?OPP=ses&bokmaal=+&ordbok=bokmaal
So what these particular s-verbs are doing is rolling up the 'hverandre', making it possible to say "see each other" in a very compact way. But I think the intuition (or inference from the -s) that you and I both had about this being a passive construction is not entirely off the mark. I don't know for sure, but I have a feeling that passive voice and reciprocality have something to do with why s-verbs arose in the first place, as well as convenience.
It's interesting to note that "vi ses" is both the present and the imperative of the verb å ses. So it seems we can think of its nearest translation being equally "we see each other" and the imperative "see each other".
I could be mistaken about all this, given that some of the mouse-over or word-click hints for "ses" are written as though it were the passive form of "å se". But it seems the ordbok says otherwise and I thought that made interesting food for thought.