Translation:I eat sushi with a fork.
If that were true you would've never been able to understand someone while they say this sentence, cultural awkwardness aside, since there are only phonetics and no such thing as kanji in speech.
If you're hearing utensils, followed by 「で」, you can guess something is done with the utensil, spoken or otherwise.
I think the problem is the voice says "フォークです、しを...", where as someone who was actually reading or speaking this would say "フォークで、すしを" (comma added for emphasis on pronunciation, not necessarily a verbal pause). Point is, the audio says "des shi" not "de sushi", so I've reported it.
When I first head this, I also couldn't figure out what "des shi" was, and the lack of kanji meant I coudn't just read it, since I read it as it was pronounced (saying "des shi" over and over in my head).
English grammar demands "I eat sushi with a fork". There is no particular reason other than to add a tiny bit of nuance. "I eat sushi with fork" is perfectly comprehensible, but sounds like caveman speech.
Compare "I eat sushi with forks" (one in each hand??) which must not have "a". But it could have "the forks".
Compare "I eat sushi with the fork" which refers to a specific fork. But this is an unusual sentence and probably wouldn't be said. More likely: "Would you like to use the chopsticks or the fork?" "The fork."
These are all ways of being more specific about which object you're referring to, but typically the average English speaker doesn't actually need them to figure out what you're talking about. They are used because they are standard.