Tips/guidance for maximizing STT transcription accuracy in noisy environments?

Hello, we are wondering if there are any suggestions/guidance/tips on reducing noise for utterances recorded in noisy environments?

**Some background:** we are working on a feature utilizing the Speech to Text service which involves users speaking short utterances. This feature currently resides in an iOS application, but may reside on desktop in the future.
When recording utterances in noisy environments, specifically environments with background speakers, the STT service picks up and transcribes those background speakers. The desired behavior is to pinpoint the “primary” speaker (closest to the microphone) as much as possible; and only transcribe that speaker’s utterance.

We have been exploring using DSP filters/noise cancellation algorithms on the device, along with exploring what microphone configurations are possible on an iOS device to narrow incoming audio. We think this might not be the best approach though.

Does IBM have any plans to provide any noise cancellation options or an option to configure speech recognition for short utterances? These options would be similar to what Google’s cloud speech API provides.

Thanks much for any help/information!


Leave a Reply