So, when I said that voice input ‘works’, what this means is that you can now use an audio wave-form to fill in a dialogue box – you can turn sound into text and text (from audio or, of course, from chatbots, which were last year’s Next Big Thing) into a structured query, and you can work out where to send that query. The problem is that you might not actually have anywhere to send it. You can use voice to fill in a dialogue box, but the dialogue box has to exist – you need to have built it first. You have to build a flight-booking system, and a restaurant booking system, and a scheduling system, and a concert booking system – and anything else a user might want to do, before you can connect voice to them. Otherwise, if the user asks for any of those, you will accurately turn their voice into text, but not be able to do anything with it – all you have is a transcription system.
— Read on www.ben-evans.com/benedictevans/2017/2/22/voice-and-the-uncanny-valley-of-ai
The key point here is that you have to have services that can connect to voice input, and that that is a massive scaling problem when considered in the general case. AI has seen it’s best successes in specific cases. It quickly breaks down after that.