Speech Recognition Polyfill (STT) wót apersongithub

Allows setup-less speech recognition (+ speech to text) in websites such as Google Translate, Duolingo, etc... very configurable. Choose between using OpenAI's Whisper or Vosk engine locally and an optional AssemblyAI's API on the server side.

0 (pógódnośenja: 0)

8 wužywarjow

Firefox ześěgnuś a rozšyrjenje wobstaraś

Dataju ześěgnuś

Wó toś tom rozšyrjenju

On first install this extension will open the options page, the default model language is English but this is easily changeable. This extension allows per-site customization and a multitude of different models to decipher language. It also has a key-bind for Speech to Text (default: Alt + A). Keep in mind that this is not a complete solution and the API doesn't have full support but it's pretty close. Speech Detection is nearly as instantaneous (depending on the model), but not as accurate as Google Chrome's Cloud API. The extension icon color/indicator changes depending on the process so pin it to your menu to verify the extension is working as intended. A red mic/error icon does not necessarily mean your mic isn't working but rather the speech may have been cancelled by user input, missing cloud API key, or that it is unintelligible (usually its the latter).

Make sure you are using the correct mic and speak loud, slow, and clear otherwise your voice may not be detected or unintelligible. Change the default model to the cloud or slightly larger or different local ones if you experience problems with voice recognition (this may impair performance). You can also try enabling "boost microphone gain" if you are a soft speaker. If you are constantly processing audio over 1-2 mins, disable the ultimatum processing timeout feature in the settings.

If you're using Duolingo or similar and are trying to do the speaking practice of the language that you are learning, it is recommended to set the language in the extension to the one you are learning (navigate to the site -> click extension icon -> set language then click "save for site"). This isn't required but it will significantly improve the accuracy of your speech since the model now knows the exact language you are trying to speak. (This isn't exactly necessary for every site, one example is google translate which tells us the exact language that is being used through the input box's data so auto-detect works fine). Look at the images for more help.

The extension will take ~1GB of ram on normal/cloud models and up to ~7GB if you use the biggest model (you don't need to use the biggest model lol). I've implemented decent memory management to compensate.

~~~~~~~~~~~~~~~~~~~~

❗ General Recommendations:

• 8GB of RAM is a minimum requirement since it could easily take up to a decent chuck when utilizing larger models.

• A modern CPU/GPU is recommended.

• An internet connection. Even though the model runs locally, the extension re-downloads it either when idle or after closing the tab/opening a new one that utilizes the extension (for memory preservation purposes). This is ultimately better than packaging the large models within the extension for the time being and for most models, the download speed will be near instant for the general population. We also have an option in settings to keep the default model cached without re-downloading every time. Apart from locally you can use the cloud based model which is less hardware intensive. Sorry offline users, I will try to see if you can download the model manually and link that folder for offline use without any external stuff. Basically it'd be a speechfire replacement. The only work around for offline use is enabling the cached option and not closing the browser or changing the model since that would clear the storage.

Z 0 wót 0 pógódnośujucych pógódnośony

Toś togo wuwijarja pódpěraś

Wuwijaŕ toś togo rozšyrjenja was pšosy, mały pśinošk pósćiś, aby jogo wuwiśe pódpěrał.

Něnto pśinosowaś

Speech Recognition Polyfill (STT) wót apersongithub

Trjebne pšawa:

Gromaźenje datow: