You seemed to have missed some key technology points that people should know in your bird's eye view.
-How big a dictionary file the engines can run is important- Can your engine recognize 100 words in a grammar file? What about 1000? How much does this cost to increase? (it is astronomical)
-How fast the engine is, because results are based on multiple recognition attempts and averaged together. eg: If you do 1 n-best result your accuracy is going to average around 70%, while if you return recognition results 10+ times on the same phrase, you start hitting the 95% average mark.
-How much effort has been put into determining an accent successfully and then interpreting the results based on the accent. (again, Nuance is 1 million miles agreed of everyone in this department, and is also probably why they can afford to charge so much)
Limitations- Alpha-Numerics are ridiculously difficult to recognize without limiting user input options. eg: Reading an account number B,3,E.... there is a good chance that there will be a bad recognition unless you start pigeonholing the user input. eg: Nerds create an algorithm in the speeechrec grammar that the 1st entity is a letter, second a number, third a letter... then further fine tune it to eliminate similar sounding entries. eg: 1st is a letter, can't be 'G" so when you say "G" it returns a "B"
Speechrec has a long way to go....