• I've already exhausted that as an option, the issue is it only works in Chrome right now and Google requires an internet connection for it to work.

    I am at a juncture where I've formulated 2 possible solutions:-

    Snips
    Install Snips on a Raspberry Pi, and have Puck act as a Speech Recognition controller sending start and stop signals directly to the MQTT overriding their hotword detection, from reading on their forums this seems to be possible and you can use JS to do this.

    Pros

    1. Completely offline/private and the trained assistant is very accurate at transcribing numbers.
    2. Whilst there would be much more prod development involved we'd have a nice branded product at the end of it that would differentiate us from competitors.

    Cons

    1. Potentially high licensing costs to use commercially, I always worry when a company don't publish their costs, usually means it's expensive :(
    2. Getting a bluetooth headset to behave reliably with a Raspberry Pi for audio input/output, seems from a few things I've read Pi's are more suited to USB mics which wouldn't suit our situation as the operator needs to be mobile/hands free.
    3. The time Snips is listening is short so if we can't reliably extend that this might mean it's not a workable solution. I have tried changing a config setting and it did last a bit longer but not as long as it should have.

    Audio Recorder
    Record mono wav files in browser(this would work whether offline or online) using the tried and tested Recorderjs then upload the wavs to Googles Cloud Speech API for transcribing. Use Puck as a Record/Pause/Stop controller. From my understanding this would work out of the box in all browsers except Safari but thats ok, hopefully they will add BLE at some point.

    Pros

    1. We could store the wavs for a period of time allowing users to audit the results if for example a particular transcribed number was inaccurate it could be corrected via the web app at a later time.
    2. The whole system could be ran on Google App Engine thus should be quick to load and operate the transcribe process.
    3. Much quicker to bring to market as it really is just the Puck communicating with a web app.
    4. Potentially bigger market due to the fact it would work in most browsers out of the box, other than the puck cost and bluetooth headset end users wouldn't have to invest in any special equipment to use the service.

    Cons

    1. Difficult to calculate/forecast Google costs.
    2. Not a fully offline solution.

    I thought I'd post my thoughts here to get them out of my head, I'm currently leaning more toward the Audio Recorder solution, my instinct is telling me the Pi route would be a challenging and tech moves so fast the product could be out dated in no time, a web app can keep evolving and I can envisage a really nice UI where you can check the Pucks battery life and other stats etc.

About

Avatar for ChimpWorks @ChimpWorks started