{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Vosk Colab Demo", "provenance": [], "collapsed_sections": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "\n", "# Vosk Colab Demo" ], "metadata": { "id": "gJMosXCKCVeJ" } }, { "cell_type": "markdown", "source": [ "Vosk is an open source offline speech recognition toolkit. Vosk \n", "contains more than 20 languages and dialects, such as English, German, Russian, Chinese, Czech, etc. The sizes of language models vary from tens of megabytes to several gigabytes. Big models are more accurate. For more information see https://alphacephei.com/vosk/.\n", "\n" ], "metadata": { "id": "kaFIKZneuuUJ" } }, { "cell_type": "markdown", "source": [ "This notebook demonstrates Vosk recognition capabilities." ], "metadata": { "id": "-sc8lpD5Brfi" } }, { "cell_type": "markdown", "source": [ "# Install module and prepare the file" ], "metadata": { "id": "etxlH1aMCwS1" } }, { "cell_type": "markdown", "source": [ "First, you have to install vosk module using the following code:" ], "metadata": { "id": "U83JNjH4y0_9" } }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "C1iiOwzooMid", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "cd273e7d-386f-4e76-fe7e-15177faf8eb1" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", "Collecting vosk\n", " Downloading vosk-0.3.44-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (7.2 MB)\n", "\u001b[K |████████████████████████████████| 7.2 MB 29.3 MB/s \n", "\u001b[?25hRequirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from vosk) (4.64.0)\n", "Collecting srt\n", " Downloading srt-3.5.2.tar.gz (24 kB)\n", "Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from vosk) (2.23.0)\n", "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.7/dist-packages (from vosk) (1.15.1)\n", "Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi>=1.0->vosk) (2.21)\n", "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->vosk) (3.0.4)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->vosk) (2022.6.15)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->vosk) (1.24.3)\n", "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->vosk) (2.10)\n", "Building wheels for collected packages: srt\n", " Building wheel for srt (setup.py) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for srt: filename=srt-3.5.2-py3-none-any.whl size=22487 sha256=1bba28757dd764450db53d963f0db37d1d03fcf8dadf68eaea4c6159e6b529f5\n", " Stored in directory: /root/.cache/pip/wheels/54/c4/ec/4604122e072aebb16803c8297b7cd3f4c72073a3ee58738015\n", "Successfully built srt\n", "Installing collected packages: srt, vosk\n", "Successfully installed srt-3.5.2 vosk-0.3.44\n" ] } ], "source": [ "!pip3 install vosk" ] }, { "cell_type": "markdown", "source": [ "## Importing the necessary modules" ], "metadata": { "id": "x-yssTkO83_E" } }, { "cell_type": "markdown", "source": [ "Secondly, we import here the necessary modules required for all the examples below:" ], "metadata": { "id": "vXvQm4qiy8bG" } }, { "cell_type": "code", "source": [ "from vosk import Model, KaldiRecognizer\n", "import wave\n", "import json" ], "metadata": { "id": "s7Rkp5pL85dJ" }, "execution_count": 23, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Download example audio file" ], "metadata": { "id": "puMMIwDRiiji" } }, { "cell_type": "markdown", "source": [ "You can upload your audio file and listen it by replacing the URL of our example with your own using the code below." ], "metadata": { "id": "1Gp0btdJRRK6" } }, { "cell_type": "code", "source": [ "!wget -q -O /content/test.wav https://github.com/alphacep/vosk-api/raw/master/python/example/test.wav\n" ], "metadata": { "id": "uUyJ-YlmFTUD" }, "execution_count": 21, "outputs": [] }, { "cell_type": "code", "source": [ "import IPython\n", "IPython.display.Audio(\"/content/test.wav\")" ], "metadata": { "id": "9ve9x0te1yLs", "colab": { "base_uri": "https://localhost:8080/", "height": 75 }, "outputId": "acf02c11-5c90-4a03-955d-0e1aa24e35d0" }, "execution_count": 22, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {}, "execution_count": 22 } ] }, { "cell_type": "markdown", "source": [ "# Recognition examples\n", "\n" ], "metadata": { "id": "zRIV7ngEt2Hn" } }, { "cell_type": "markdown", "source": [ "By default, Vosk uses vosk-model-small-en-us-0.15, defined by the `en-us` lang option. The other options `model_path` and `model_name` allow you to use a specific model path or model name. " ], "metadata": { "id": "foQP3qpWmI4J" } }, { "cell_type": "markdown", "source": [ "When a model is mentioned for the first time, it is automatically downloaded and saved; when a model is mentioned again, an already downloaded model is used.\n", "\n", "Initializing the model by language:\n" ], "metadata": { "id": "tgqQGvp3Qv40" } }, { "cell_type": "code", "source": [ "model = Model(lang=\"en-us\")" ], "metadata": { "id": "zgE70Gx9Qwnf", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "5677cac8-ff22-434f-80ed-afa5f0fa2faf" }, "execution_count": 7, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "vosk-model-small-en-us-0.15.zip: 100%|██████████| 39.3M/39.3M [00:03<00:00, 13.0MB/s]\n" ] } ] }, { "cell_type": "markdown", "source": [ "Open downloaded file in 'read bytes' mode as wave object:" ], "metadata": { "id": "nQ3yyS7kJEX6" } }, { "cell_type": "code", "source": [ "wf = wave.open('/content/test.wav', 'rb')" ], "metadata": { "id": "U7YPLDBuJGP6" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "The KaldiRecognizer class contains the configuration methods needed here, such as SetWords, SetPartialWords, AcceptWaveform, and others.\n", "\n", "The model object is the first parameter for KaldiRecognizer. The second parameter passed to KaldiRecognizer is the sample rate, which can be passed directly as a number like 8000 or 16000 Hz, which will be demonstrated below or using getframerate method shown in the following code fragment.\n", "\n", "Creating a KaldiRecognizer object with model and sample rate arguments:" ], "metadata": { "id": "X5bWgr6PLPKQ" } }, { "cell_type": "code", "source": [ "rec = KaldiRecognizer(model, wf.getframerate())" ], "metadata": { "id": "5McnLKF1LPbE" }, "execution_count": 8, "outputs": [] }, { "cell_type": "markdown", "source": [ "The previous commands are the same for the most of examples, but the following are different.\n", "\n", "Activating timestamps for recognized words (partial result and result attributes in recognized result) using methods `SetWords` and `SetPartialWords`:" ], "metadata": { "id": "7U4eqX0-0yDv" } }, { "cell_type": "code", "source": [ "rec.SetWords(True)\n", "rec.SetPartialWords(True)" ], "metadata": { "id": "X4ST1Mi20yUN" }, "execution_count": 9, "outputs": [] }, { "cell_type": "markdown", "source": [ "The `AcceptWaveform` method reports the presence of a pause after a speech fragment in the audio file, which allows it to be returned from the recognizer and print.\n", "\n", "`KaldiRecognizer` class also contains methods for presenting recognition results, such as `Result`, `PartialResult`, `FinalResult`. \n", "\n", "\n", "> The `PartialResult` method of the `KaldiRecognizer` class returns a string obtained from the dictionary with the \"key\" \"partial\", and the \"value\" that contains recognized fragment of the audio file, which ends with a pause between words.\n", "\n", "> The `Result` method of the `KaldiRecognizer` class returns a string obtained from the dictionary with the \"key\" \"text\", and the \"value\" that contains recognized fragment of the audio file, which ends with a pause between its parts like phrases and sentences.\n", "\n", "> The `FinalResult` method of the `KaldiRecognizer` class returns a string obtained from the dictionary with the \"key\" \"text\" and the \"value\" that contains all the recognized text.\n", "\n", "Run recognition process:" ], "metadata": { "id": "oHXO2gAV_QEP" } }, { "cell_type": "code", "source": [ "while True:\n", " data = wf.readframes(4000)\n", " if len(data) == 0:\n", " break\n", " if rec.AcceptWaveform(data):\n", " print(rec.Result())\n", " else:\n", " print(rec.PartialResult())\n", "\n", "print(rec.FinalResult())" ], "metadata": { "id": "hDUUjPmO_RRN", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "c1233465-6f9a-42ba-93bb-8ff41c60e04b" }, "execution_count": 10, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"one zero zero\",\n", " \"partial_result\" : [{\n", " \"conf\" : 1.000000,\n", " \"end\" : 1.110000,\n", " \"start\" : 0.840000,\n", " \"word\" : \"one\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 1.530000,\n", " \"start\" : 1.110000,\n", " \"word\" : \"zero\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 1.890000,\n", " \"start\" : 1.530000,\n", " \"word\" : \"zero\"\n", " }]\n", "}\n", "{\n", " \"partial\" : \"one zero zero\",\n", " \"partial_result\" : [{\n", " \"conf\" : 1.000000,\n", " \"end\" : 1.110000,\n", " \"start\" : 0.840000,\n", " \"word\" : \"one\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 1.530000,\n", " \"start\" : 1.110000,\n", " \"word\" : \"zero\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 1.890000,\n", " \"start\" : 1.530000,\n", " \"word\" : \"zero\"\n", " }]\n", "}\n", "{\n", " \"result\" : [{\n", " \"conf\" : 1.000000,\n", " \"end\" : 1.110000,\n", " \"start\" : 0.840000,\n", " \"word\" : \"one\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 1.530000,\n", " \"start\" : 1.110000,\n", " \"word\" : \"zero\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 1.920000,\n", " \"start\" : 1.530000,\n", " \"word\" : \"zero\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 2.310000,\n", " \"start\" : 1.920000,\n", " \"word\" : \"zero\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 2.610000,\n", " \"start\" : 2.310000,\n", " \"word\" : \"one\"\n", " }],\n", " \"text\" : \"one zero zero zero one\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"result\" : [{\n", " \"conf\" : 0.560920,\n", " \"end\" : 4.110000,\n", " \"start\" : 3.930000,\n", " \"word\" : \"nah\"\n", " }, {\n", " \"conf\" : 0.616773,\n", " \"end\" : 4.290000,\n", " \"start\" : 4.110000,\n", " \"word\" : \"no\"\n", " }, {\n", " \"conf\" : 0.693737,\n", " \"end\" : 4.560000,\n", " \"start\" : 4.290000,\n", " \"word\" : \"to\"\n", " }, {\n", " \"conf\" : 0.498215,\n", " \"end\" : 4.620000,\n", " \"start\" : 4.560000,\n", " \"word\" : \"i\"\n", " }, {\n", " \"conf\" : 0.785684,\n", " \"end\" : 4.980000,\n", " \"start\" : 4.620000,\n", " \"word\" : \"know\"\n", " }],\n", " \"text\" : \"nah no to i know\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"\"\n", "}\n", "{\n", " \"partial\" : \"zero\",\n", " \"partial_result\" : [{\n", " \"conf\" : 1.000000,\n", " \"end\" : 6.690000,\n", " \"start\" : 6.240000,\n", " \"word\" : \"zero\"\n", " }]\n", "}\n", "{\n", " \"partial\" : \"zero\",\n", " \"partial_result\" : [{\n", " \"conf\" : 1.000000,\n", " \"end\" : 6.690000,\n", " \"start\" : 6.240000,\n", " \"word\" : \"zero\"\n", " }]\n", "}\n", "{\n", " \"partial\" : \"zero\",\n", " \"partial_result\" : [{\n", " \"conf\" : 1.000000,\n", " \"end\" : 6.690000,\n", " \"start\" : 6.240000,\n", " \"word\" : \"zero\"\n", " }]\n", "}\n", "{\n", " \"result\" : [{\n", " \"conf\" : 1.000000,\n", " \"end\" : 6.690000,\n", " \"start\" : 6.240000,\n", " \"word\" : \"zero\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 6.900000,\n", " \"start\" : 6.690000,\n", " \"word\" : \"one\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 7.110000,\n", " \"start\" : 6.930000,\n", " \"word\" : \"eight\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 7.500000,\n", " \"start\" : 7.110000,\n", " \"word\" : \"zero\"\n", " }, {\n", " \"conf\" : 1.000000,\n", " \"end\" : 7.980000,\n", " \"start\" : 7.500000,\n", " \"word\" : \"three\"\n", " }],\n", " \"text\" : \"zero one eight zero three\"\n", "}\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Recognition with alternatives\n", "\n", "Run the initial code that was described above:" ], "metadata": { "id": "osi0Yq2zez5W" } }, { "cell_type": "code", "source": [ "wf = wave.open('/content/test.wav', 'rb')\n", "model = Model(lang=\"en-us\")\n", "rec = KaldiRecognizer(model, wf.getframerate())\n", "rec.SetWords(True)" ], "metadata": { "id": "aAd_SFH6fdQK" }, "execution_count": 32, "outputs": [] }, { "cell_type": "markdown", "source": [ "`SetMaxAlternatives(n)` method of the `KaldiRecognizer` class shows no more than 'n' different alternatives of the recognized result, which may appear, for example, due to the low quality of the audio file." ], "metadata": { "id": "uAH4IStvKZ3Q" } }, { "cell_type": "code", "source": [ "rec.SetMaxAlternatives(10)" ], "metadata": { "id": "WZPF_vEYKbJB" }, "execution_count": 12, "outputs": [] }, { "cell_type": "markdown", "source": [ "The recognition result is converted from a string to a dictionary, which is more convenient for its further processing using the json.loads method.\n", "\n", "Run recognition process:" ], "metadata": { "id": "9XFWqNt_KjeM" } }, { "cell_type": "code", "source": [ "while True:\n", " data = wf.readframes(4000)\n", " if len(data) == 0:\n", " break\n", " if rec.AcceptWaveform(data):\n", " print(json.loads(rec.Result()))\n", " else:\n", " print(json.loads(rec.PartialResult()))\n", "\n", "print(json.loads(rec.FinalResult()))" ], "metadata": { "id": "azIbIZueKjte", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "e479c52e-86d2-4110-f9fb-4034af840a39" }, "execution_count": 13, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "{'partial': ''}\n", "{'partial': ''}\n", "{'partial': ''}\n", "{'partial': ''}\n", "{'partial': ''}\n", "{'partial': 'one'}\n", "{'partial': 'one zero'}\n", "{'partial': 'one zero zero'}\n", "{'partial': 'one zero zero'}\n", "{'partial': 'one zero zero zero'}\n", "{'partial': 'one zero zero zero one'}\n", "{'partial': 'one zero zero zero one'}\n", "{'partial': 'one zero zero zero one'}\n", "{'partial': 'one zero zero zero one'}\n", "{'alternatives': [{'confidence': 265.527069, 'result': [{'end': 1.11, 'start': 0.84, 'word': 'one'}, {'end': 1.53, 'start': 1.11, 'word': 'zero'}, {'end': 1.92, 'start': 1.53, 'word': 'zero'}, {'end': 2.31, 'start': 1.92, 'word': 'zero'}, {'end': 2.61, 'start': 2.31, 'word': 'one'}], 'text': 'one zero zero zero one'}]}\n", "{'partial': ''}\n", "{'partial': ''}\n", "{'partial': 'nah no'}\n", "{'partial': 'nah no'}\n", "{'partial': 'nah no to'}\n", "{'partial': 'nah no to i know'}\n", "{'partial': 'nah no to i know'}\n", "{'partial': 'nah no to i know'}\n", "{'alternatives': [{'confidence': 174.606827, 'result': [{'end': 4.11, 'start': 3.93, 'word': 'nah'}, {'end': 4.29, 'start': 4.11, 'word': 'no'}, {'end': 4.56, 'start': 4.29, 'word': 'to'}, {'end': 4.62, 'start': 4.56, 'word': 'i'}, {'end': 4.98, 'start': 4.62, 'word': 'know'}], 'text': 'nah no to i know'}, {'confidence': 173.904785, 'result': [{'end': 4.17, 'start': 3.93, 'word': 'nine'}, {'end': 4.29, 'start': 4.17, 'word': 'oh'}, {'end': 4.56, 'start': 4.29, 'word': 'two'}, {'end': 4.62, 'start': 4.56, 'word': 'i'}, {'end': 4.98, 'start': 4.62, 'word': 'know'}], 'text': 'nine oh two i know'}, {'confidence': 173.745651, 'result': [{'end': 4.11, 'start': 3.93, 'word': 'nah'}, {'end': 4.29, 'start': 4.11, 'word': 'no'}, {'end': 4.56, 'start': 4.29, 'word': 'to'}, {'end': 4.62, 'start': 4.56, 'word': 'ah'}, {'end': 4.98, 'start': 4.62, 'word': 'no'}], 'text': 'nah no to ah no'}, {'confidence': 173.601868, 'result': [{'end': 4.11, 'start': 3.93, 'word': 'nah'}, {'end': 4.29, 'start': 4.11, 'word': 'no'}, {'end': 4.56, 'start': 4.29, 'word': 'to'}, {'end': 4.62, 'start': 4.56, 'word': 'a'}, {'end': 4.98, 'start': 4.62, 'word': 'know'}], 'text': 'nah no to a know'}, {'confidence': 173.316528, 'result': [{'end': 4.11, 'start': 3.93, 'word': 'nah'}, {'end': 4.29, 'start': 4.11, 'word': 'no'}, {'end': 4.56, 'start': 4.29, 'word': 'two'}, {'end': 4.62, 'start': 4.56, 'word': 'i'}, {'end': 4.98, 'start': 4.62, 'word': 'know'}], 'text': 'nah no two i know'}, {'confidence': 173.297699, 'result': [{'end': 4.17, 'start': 3.93, 'word': 'nine'}, {'end': 4.29, 'start': 4.17, 'word': 'o'}, {'end': 4.56, 'start': 4.29, 'word': 'two'}, {'end': 4.62, 'start': 4.56, 'word': 'i'}, {'end': 4.98, 'start': 4.62, 'word': 'know'}], 'text': 'nine o two i know'}, {'confidence': 173.114288, 'result': [{'end': 4.11, 'start': 3.93, 'word': 'nah'}, {'end': 4.29, 'start': 4.11, 'word': 'no'}, {'end': 4.56, 'start': 4.29, 'word': 'to'}, {'end': 4.62, 'start': 4.56, 'word': 'or'}, {'end': 4.98, 'start': 4.62, 'word': 'know'}], 'text': 'nah no to or know'}, {'confidence': 173.079651, 'result': [{'end': 4.11, 'start': 3.93, 'word': 'nah'}, {'end': 4.29, 'start': 4.11, 'word': 'no'}, {'end': 4.56, 'start': 4.29, 'word': 'to'}, {'end': 4.62, 'start': 4.56, 'word': 'ah'}, {'end': 4.98, 'start': 4.62, 'word': 'know'}], 'text': 'nah no to ah know'}, {'confidence': 173.018143, 'result': [{'end': 4.11, 'start': 3.93, 'word': 'nah'}, {'end': 4.29, 'start': 4.11, 'word': 'no'}, {'end': 4.56, 'start': 4.29, 'word': 'to'}, {'end': 4.62, 'start': 4.56, 'word': 'or'}, {'end': 4.98, 'start': 4.62, 'word': 'no'}], 'text': 'nah no to or no'}, {'confidence': 173.00589, 'result': [{'end': 4.29, 'start': 3.93, 'word': 'nano'}, {'end': 4.56, 'start': 4.29, 'word': 'to'}, {'end': 4.62, 'start': 4.56, 'word': 'i'}, {'end': 4.98, 'start': 4.62, 'word': 'know'}], 'text': 'nano to i know'}]}\n", "{'partial': ''}\n", "{'partial': ''}\n", "{'partial': ''}\n", "{'partial': 'zero'}\n", "{'partial': 'zero one'}\n", "{'partial': 'zero one eight six'}\n", "{'partial': 'zero one eight zero'}\n", "{'partial': 'zero one eight zero'}\n", "{'partial': 'zero one eight zero three'}\n", "{'partial': 'zero one eight zero three'}\n", "{'alternatives': [{'confidence': 209.819153, 'result': [{'end': 6.69, 'start': 6.24, 'word': 'zero'}, {'end': 6.9, 'start': 6.69, 'word': 'one'}, {'end': 7.11, 'start': 6.93, 'word': 'eight'}, {'end': 7.5, 'start': 7.11, 'word': 'zero'}, {'end': 7.98, 'start': 7.5, 'word': 'three'}], 'text': 'zero one eight zero three'}]}\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Grammar recognizer\n" ], "metadata": { "id": "oqGYLblyXoys" } }, { "cell_type": "markdown", "source": [ "Now lets demonstrate online grammar to improve accuracy." ], "metadata": { "id": "UziNTcIeHN7Z" } }, { "cell_type": "code", "source": [ "wf = wave.open('/content/test.wav', \"rb\")\n", "rec = KaldiRecognizer(model, wf.getframerate(), '[\"one zero zero zero one\", \"nine oh two one oh\", \"zero one eight zero three\", \"[unk]\"]')" ], "metadata": { "id": "n1Ptwuy0-0gX" }, "execution_count": 30, "outputs": [] }, { "cell_type": "markdown", "source": [ "Using this recognizer we can get more acccurate results since we already specified the expected input " ], "metadata": { "id": "QjeHrGIXYlfz" } }, { "cell_type": "code", "source": [ "while True:\n", " data = wf.readframes(4000)\n", " if len(data) == 0:\n", " break\n", " if rec.AcceptWaveform(data):\n", " print(rec.Result())\n", " else:\n", " jres = json.loads(rec.PartialResult())\n", " print(jres)\n" ], "metadata": { "id": "w8ZmzX3fYltr", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "a8f49e18-16f9-43c9-d7a4-0d182a9779c6" }, "execution_count": 31, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "{'partial': ''}\n", "{'partial': ''}\n", "{'partial': ''}\n", "{'partial': ''}\n", "{'partial': 'one'}\n", "{'partial': 'one zero'}\n", "{'partial': 'one zero'}\n", "{'partial': 'one zero zero'}\n", "{'partial': 'one zero zero'}\n", "{'partial': 'one zero zero zero'}\n", "{'partial': 'one zero zero zero one'}\n", "{'partial': 'one zero zero zero one'}\n", "{'partial': 'one zero zero zero one'}\n", "{\n", " \"text\" : \"one zero zero zero one\"\n", "}\n", "{'partial': ''}\n", "{'partial': 'one'}\n", "{'partial': 'nine'}\n", "{'partial': 'nine oh two'}\n", "{'partial': 'nine oh two one'}\n", "{'partial': 'nine oh two one oh'}\n", "{'partial': 'nine oh two one oh'}\n", "{'partial': 'nine oh two one oh'}\n", "{\n", " \"text\" : \"nine oh two one oh\"\n", "}\n", "{'partial': 'one'}\n", "{'partial': 'one'}\n", "{'partial': ''}\n", "{'partial': 'zero'}\n", "{'partial': 'zero one'}\n", "{'partial': 'zero one eight'}\n", "{'partial': 'zero one eight zero'}\n", "{'partial': 'zero one eight zero'}\n", "{'partial': 'zero one eight zero three'}\n", "{'partial': 'zero one eight zero three'}\n", "{'partial': 'zero one eight zero three'}\n" ] } ] } ] }