So, similar to the way that a program "scans and doesn't read" a scanned digitized file, is the way that Youtube makes it's captions for it's videos. It merely only guesses what the video is saying.
For a quick laugh, watch this Rhett and Link video where they mess with Youtube's failed caption-rendering program.