The numbers 1, 2, 3, 4, 5 and 8, as well as numbers ending with those numbers, "agree" with the noun class they refer to. The normal forms used for counting are the N class forms, so you say, for example, nyumba moja, nyumba mbili, but with other noun classes they take the same prefixes as the nouns of that class do, for example mtu mmoja "one person", watu wawili "two people", watu watatu etc.
The number two has the stem -wili when a prefix is added. The "mb-" in mbili is actually an example of the N class adding a nasal and hardening the consonant, similar to how -refu becomes ndefu in the N class.
This is to some extent a matter of opinion. Some native speakers stop making the number agree with the noun class after 8, so they will say "watu kumi na mbili"; "viti ishirini na moja" rather than "watu kumi na wawili; viti ishirini na kimoja". I don't know if language standardization committees, institutes, etc. have ruled on this grammar point recently or not. If you are taking a Swahili language exam, probably best to stick with "viti kumi na kimoja"; if you're having a conversation, no one will be surprised if you say "viti kumi na moja."