Controlling Mic Button on Mobile Safari Keyboard

Posted on
  • I've been testing out various ways to do offline speech to text, one idea I've had is to use the built-in offline capable speech to text engine in mobile safari. Would it be possible once a Puck is paired with iOS to control the mic on/off?

  • You'd need to look into what you need to do to turn the mic on and off, but if it's just a keypress then yes - you could make Puck.js appear as a keyboard with http://www.espruino.com/BLE+Keyboard and could then just 'press' the key.

  • You'd need to look into what you need to do to turn the mic on and off

    That's the bit I'm struggling with, not sure how to go about finding what code would trigger the mic icon to be pressed. I've looked in the linked pdf but not sure if they apply to iOS devices.

  • Seems there is no obvious way to initiate the dictation button other than the software keyboard so looks like that isn't an option.

  • Hmm - that's a shame. Depending on what you're planning there are dictation APIs like the Web Speech API (https://www.google.com/intl/en/chrome/demos/speech.html) although given Apple's track record with browser features it's probably not implemented.

    You could then do something similar with the Puck, just sending some other keypress that a website could detect and use to initiate the web speech API.

  • I've already exhausted that as an option, the issue is it only works in Chrome right now and Google requires an internet connection for it to work.

    I am at a juncture where I've formulated 2 possible solutions:-

    Snips
    Install Snips on a Raspberry Pi, and have Puck act as a Speech Recognition controller sending start and stop signals directly to the MQTT overriding their hotword detection, from reading on their forums this seems to be possible and you can use JS to do this.

    Pros

    1. Completely offline/private and the trained assistant is very accurate at transcribing numbers.
    2. Whilst there would be much more prod development involved we'd have a nice branded product at the end of it that would differentiate us from competitors.

    Cons

    1. Potentially high licensing costs to use commercially, I always worry when a company don't publish their costs, usually means it's expensive :(
    2. Getting a bluetooth headset to behave reliably with a Raspberry Pi for audio input/output, seems from a few things I've read Pi's are more suited to USB mics which wouldn't suit our situation as the operator needs to be mobile/hands free.
    3. The time Snips is listening is short so if we can't reliably extend that this might mean it's not a workable solution. I have tried changing a config setting and it did last a bit longer but not as long as it should have.

    Audio Recorder
    Record mono wav files in browser(this would work whether offline or online) using the tried and tested Recorderjs then upload the wavs to Googles Cloud Speech API for transcribing. Use Puck as a Record/Pause/Stop controller. From my understanding this would work out of the box in all browsers except Safari but thats ok, hopefully they will add BLE at some point.

    Pros

    1. We could store the wavs for a period of time allowing users to audit the results if for example a particular transcribed number was inaccurate it could be corrected via the web app at a later time.
    2. The whole system could be ran on Google App Engine thus should be quick to load and operate the transcribe process.
    3. Much quicker to bring to market as it really is just the Puck communicating with a web app.
    4. Potentially bigger market due to the fact it would work in most browsers out of the box, other than the puck cost and bluetooth headset end users wouldn't have to invest in any special equipment to use the service.

    Cons

    1. Difficult to calculate/forecast Google costs.
    2. Not a fully offline solution.

    I thought I'd post my thoughts here to get them out of my head, I'm currently leaning more toward the Audio Recorder solution, my instinct is telling me the Pi route would be a challenging and tech moves so fast the product could be out dated in no time, a web app can keep evolving and I can envisage a really nice UI where you can check the Pucks battery life and other stats etc.

  • Yes, I'd say the website solution sounds good to me. Google's cloud API is probably better than the Pi solution and potentially there is the option of using an entirely browser-based solution if one becomes available.

    You can also make the Puck use normal keypresses (space/enter/etc - or characters) that even an iOS device running safari can pick up on without the need for Web Bluetooth or any special app.

  • Thanks @Gordon I think that would be a good short term solution until BLE is fully supported in all browsers. I'll have a play and see if I can get it working.

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

Controlling Mic Button on Mobile Safari Keyboard

Posted by Avatar for ChimpWorks @ChimpWorks

Actions