Break the sentence down. If you have all the pieces, that makes it much easier to reconstruct the sentence in English. Por que = Why; no=no, not,; Somos= We are; amigos = friends.
So with a direct translation, you have
"Why not we are friends?" Then all you have to do is rearrange it so it makes sense. "Why are we not friends?"
¿Por qué no ustedes son a mi amigos?" maybe is "Why are you not my friends?". "somos" ="we are" . http://www.spanishdict.com/translate/ser#conjugation
Nothing, really. "Aren't" is simply a compound word - they took "are" and "not" and squished them together with an apostrophe to replace the missing letter(s). This is also the case with "it's" (it is), "can't" (can not), etc.
Compound words are a convenient way to say two words at once.
The pronunciations do not correspond with captioned sentence, because qué is not accentuated, and there is not a slight pause after qué; moreover, the first speaker accentuates amigos more than qué . Reverse the accentuations, and captioned sentence would be correct. Listen carefully. Accordingly, the translation of what can be heard, would be: Because we are not friends?