speech to text speed and accuracy


We are trying to utilize speech to text for our restaurant clients. However, we facing two issues:
1. accuracy of the speech to text conversion, specially with short sentences, sometimes single word. We tried to use the Custom language models to improve the accuracy but seems to have little success. any other ways to improve on this?
2. The time it takes to convert the speech to text. It seems to take anywhere between 5-10 seconds for a short sentence to be converted. Anyway this process can be improved?




