谷歌语音到 C# 中的文本 API [英] Google speech to text API in C#

查看:20
本文介绍了谷歌语音到 C# 中的文本 API的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当flac文件使用windows的录音机录制手册并使用软件转换器进行转换时,我的程序从谷歌得到了正确的响应.
但是当我使用我的程序记录的文件时,我得到了 "{"result":[]}来自谷歌.我该怎么办?这是我的代码:
发件人:

 private static void CopyStream(FileStream fileStream, Stream requestStream){var 缓冲区 = 新字节 [32768];读入;while ((read = fileStream.Read(buffer, 0, buffer.Length)) > 0){requestStream.Write(buffer, 0, read);}}私有静态无效配置请求(HttpWebRequest请求){request.KeepAlive = true;request.SendChunked = true;request.ContentType = "audio/x-flac; rate=44100";request.UserAgent ="Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip,deflate,sdch");request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-GB,en-US;q=0.8,en;q=0.6");request.Headers.Set(HttpRequestHeader.AcceptCharset, "ISO-8859-1,utf-8;q=0.7,*;q=0.3");request.Method = "POST";}使用 (var fileStream = new FileStream(@"C:UsersAhmad MustofaDocumentsVisual Studio 2010ProjectsFPFPinDebugvoice.flac", FileMode.Open)){const string requestUrl = "https://www.google.com/speech-api/v2/recognize?output=json&lang=ar-sa&key=AIzaSyBJ6VJ326Rpb23msih2wGhXENEwU1TF1PA&client=chromium&maxresults=1&p;p;var request = (HttpWebRequest)WebRequest.Create(requestUrl);配置请求(请求);var requestStream = request.GetRequestStream();CopyStream(fileStream, requestStream);使用 (var response = request.GetResponse()){使用 (var responseStream = response.GetResponseStream()){使用 (var zippedStream = new GZipStream(responseStream, CompressionMode.Decompress)){使用 (var sr = new StreamReader(zippedStream)){var res = sr.ReadToEnd();state.Text = res;}}}}}

wav 录音机:

 private void sourceStream_DataAvailable(object sender, NAudio.Wave.WaveInEventArgs e){如果(waveWriter == null)返回;waveWriter.WriteData(e.Buffer, 0, e.BytesRecorded);waveWriter.Flush();}fileName = "C:\Users\Ahmad Mustofa\Documents\Visual Studio 2010\Projects\FP\FP\bin\debug\voice.wav";int deviceNumber = hardware.SelectedItems[0].Index;尝试{sourceStream = new NAudio.Wave.WaveIn();sourceStream.DeviceNumber = deviceNumber;sourceStream.WaveFormat = new NAudio.Wave.WaveFormat(44100, NAudio.Wave.WaveIn.GetCapabilities(deviceNumber).Channels);sourceStream.DataAvailable += new EventHandler(sourceStream_DataAvailable);waveWriter = new NAudio.Wave.WaveFileWriter(fileName, sourceStream.WaveFormat);sourceStream.StartRecording();}捕获(异常前){state.Text = "disini" + ex.Message;}

flac 转换器:

 string inputFile = Path.Combine("wav", input);string outputFile = Path.Combine("flac", Path.ChangeExtension(input, ".flac"));如果 (!File.Exists(inputFile))throw new ApplicationException("找不到输入文件" + inputFile + "!");WavReader wav = new WavReader(inputFile);使用 (var flacStream = File.Create(outputFile)){FlacWriter flac = new FlacWriter(flacStream, wav.BitDepth, wav.Channels, wav.SampleRate);//缓冲 1 秒的音频数据byte[] 缓冲区 = 新字节 [wav.Bitrate/8];int 字节读取;做{bytesRead = wav.InputStream.Read(buffer, 0, buffer.Length);flac.Convert(buffer, 0, bytesRead);} while (bytesRead > 0);flac.Dispose();flac = 空;}

解决方案

谷歌云 api url 中明确提到,即

https:///cloud.google.com/speech-to-text/docs/async-recognize#speech-async-recognize-gcs-protocol

如果操作还没有完成,你可以通过重复发出GET请求来轮询端点,直到响应的done属性为真.

<代码> {"name": "此处的操作名称",元数据":{"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",进度百分比":0,"startTime": "2018-12-18T10:56:09.425584Z","lastUpdateTime": "2018-12-18T11:10:27.147310Z"},完成":真的,}

通过重复发出 GET 请求来轮询端点,直到响应的 done 属性为真,或者您可以检查 "progressPercent": 0 直到它的值变为 100.一旦它的 100% 则 done 属性变为真.

我在我的代码中使用操作名称做了同样的事情,这里是代码供参考

public async TaskTranscribeLongMediaFile(字符串操作名称){string bearerToken = GetOAuthToken();var baseUrl = new Uri(googleSpeechBaseUrl + operationName);字符串 resultContent = string.Empty;使用 (var client = new HttpClient()){client.DefaultRequestHeaders.Add(HttpRequestHeader.Authorization.ToString(), "Bearer " + bearerToken);client.DefaultRequestHeaders.Add(HttpRequestHeader.ContentType.ToString(), "application/json; charset=utf-8");client.Timeout = TimeSpan.FromMilliseconds(Timeout.Infinite);int currentPercentage = 0;bool responseStatus = false;而(!响应状态){响应状态 = 假;//发送请求使用 (var 结果 = await client.GetAsync(baseUrl)){resultContent = 等待 result.Content.ReadAsStringAsync();ResponseObject responseObject = JsonConvert.DeserializeObject(resultContent);currentPercentage = responseObject.metadata.progressPercent;responseStatus = (responseObject.done && currentPercentage == 100);//根据百分比值延迟请求重复发出 GET 请求,直到响应的 done 属性为真.等待 Task.Delay(CalculateDealy(currentPercentage));}}};返回结果内容;}

为了延迟get请求:

////<摘要>///将请求延迟到毫秒数///</总结>///<param name="currentPercentage"></param>///<returns></returns>私人 int CalculateDealy(int currentPercentage){int x = currentPercentage/10;返回 (10 - x) * 1500;}

获取身份验证令牌:

////<摘要>///获取 OAuth 令牌///</总结>///<returns></returns>公共字符串 GetOAuthToken(){返回 googleCredential.UnderlyingCredential.GetAccessTokenForRequestAsync("https://accounts.google.com/o/oauth2/v2/auth", CancellationToken.None).Result;}

最后你会得到这样的结果:

<代码> {"name": "此处的操作名称",元数据":{"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",进度百分比":100,"startTime": "2018-12-18T10:56:09.425584Z","lastUpdateTime": "2018-12-18T11:10:27.147310Z"},完成":真的,回复": {"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",结果": [{备择方案": [{"transcript": "好的,让我们开始吧",信心":0.97442055}]}, 等等 .....

需要的东西:

  1. api-key.json 文件
  2. 安装包 Google.Apis.Auth.OAuth2 以授权HTTP 网络请求

谢谢

My program get a correct respon from google when the flac file recorded manual by using windows's sound recorder and convert it using a software converter.
But when I use the file that recorded by my program, I got "{"result":[]} " from google. What should I do? here is my code :
the sender :

    private static void CopyStream(FileStream fileStream, Stream requestStream)
    {
        var buffer = new byte[32768];
        int read;
        while ((read = fileStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            requestStream.Write(buffer, 0, read);
        }
    }

    private static void ConfigureRequest(HttpWebRequest request)
    {
        request.KeepAlive = true;
        request.SendChunked = true;
        request.ContentType = "audio/x-flac; rate=44100";
        request.UserAgent =
            "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
        request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip,deflate,sdch");
        request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-GB,en-US;q=0.8,en;q=0.6");
        request.Headers.Set(HttpRequestHeader.AcceptCharset, "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
        request.Method = "POST";
    }
    using (var fileStream = new FileStream(@"C:UsersAhmad MustofaDocumentsVisual Studio 2010ProjectsFPFPinDebugvoice.flac", FileMode.Open))
    {
        const string requestUrl = "https://www.google.com/speech-api/v2/recognize?output=json&lang=ar-sa&key=AIzaSyBJ6VJ326Rpb23msih2wGhXENEwU1TF1PA&client=chromium&maxresults=1&pfilter=2";
        var request = (HttpWebRequest)WebRequest.Create(requestUrl);
        ConfigureRequest(request);
        var requestStream = request.GetRequestStream();
        CopyStream(fileStream, requestStream);

        using (var response = request.GetResponse())
        {
            using (var responseStream = response.GetResponseStream())
            {
                using (var zippedStream = new GZipStream(responseStream, CompressionMode.Decompress))
                {
                     using (var sr = new StreamReader(zippedStream))
                     {
                          var res = sr.ReadToEnd();
                          state.Text = res;
                     }
                }
            }
        }
    }

the wav recorder:

        private void sourceStream_DataAvailable(object sender, NAudio.Wave.WaveInEventArgs e)
        {
             if (waveWriter == null) return;

             waveWriter.WriteData(e.Buffer, 0, e.BytesRecorded);
             waveWriter.Flush();
        }
        fileName = "C:\Users\Ahmad Mustofa\Documents\Visual Studio 2010\Projects\FP\FP\bin\debug\voice.wav";
        int deviceNumber = hardware.SelectedItems[0].Index;
        try
        {
            sourceStream = new NAudio.Wave.WaveIn();
            sourceStream.DeviceNumber = deviceNumber;
            sourceStream.WaveFormat = new NAudio.Wave.WaveFormat(44100, NAudio.Wave.WaveIn.GetCapabilities(deviceNumber).Channels);

            sourceStream.DataAvailable += new EventHandler<NAudio.Wave.WaveInEventArgs>(sourceStream_DataAvailable);
            waveWriter = new NAudio.Wave.WaveFileWriter(fileName, sourceStream.WaveFormat);

            sourceStream.StartRecording();
        }
        catch (Exception ex)
        {
            state.Text = "disini" + ex.Message;
        }

flac converter:

        string inputFile = Path.Combine("wav ", input);
        string outputFile = Path.Combine("flac", Path.ChangeExtension(input, ".flac"));

        if (!File.Exists(inputFile))
            throw new ApplicationException("Input file " + inputFile + " cannot be found!");

        WavReader wav = new WavReader(inputFile);

        using (var flacStream = File.Create(outputFile))
        {
            FlacWriter flac = new FlacWriter(flacStream, wav.BitDepth, wav.Channels, wav.SampleRate);
            // Buffer for 1 second's worth of audio data
            byte[] buffer = new byte[wav.Bitrate / 8];
            int bytesRead;
            do
            {
                bytesRead = wav.InputStream.Read(buffer, 0, buffer.Length);
                flac.Convert(buffer, 0, bytesRead);
            } while (bytesRead > 0);
            flac.Dispose();
            flac = null;
        }

解决方案

It's clearly mentioned in the Google cloud api url i.e

https://cloud.google.com/speech-to-text/docs/async-recognize#speech-async-recognize-gcs-protocol

If the operation has not completed, you can poll the endpoint by repeatedly making the GET request until the done property of the response is true.

        {
      "name": "operationname here",
      "metadata": {
        "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
        "progressPercent": 0,
        "startTime": "2018-12-18T10:56:09.425584Z",
        "lastUpdateTime": "2018-12-18T11:10:27.147310Z"
      },
      "done": true,
    }

poll the endpoint by repeatedly making the GET request until the done property of the response is true or you can check for the "progressPercent": 0 until it's value become 100. Once its 100 percent then done property becomes true.

I did the same in my code using operation name, for reference here is the code

public async Task<string> TranscribeLongMediaFile(string operationName)
    {
        string bearerToken = GetOAuthToken();
        var baseUrl = new Uri(googleSpeechBaseUrl + operationName);
        string resultContent = string.Empty;
        using (var client = new HttpClient())
        {
            client.DefaultRequestHeaders.Add(HttpRequestHeader.Authorization.ToString(), "Bearer " + bearerToken);
            client.DefaultRequestHeaders.Add(HttpRequestHeader.ContentType.ToString(), "application/json; charset=utf-8");

            client.Timeout = TimeSpan.FromMilliseconds(Timeout.Infinite);

            int currentPercentage = 0;
            bool responseStatus = false;
            while (!responseStatus)
            {
                responseStatus = false;
                // Send request
                using (var result = await client.GetAsync(baseUrl))
                {
                    resultContent = await result.Content.ReadAsStringAsync();

                    ResponseObject responseObject = JsonConvert.DeserializeObject<ResponseObject>(resultContent);
                    currentPercentage = responseObject.metadata.progressPercent;
                    responseStatus = (responseObject.done && currentPercentage == 100);

                    // Delay the request based on percentage value to repeatedly making the GET request until the done property of the response is true.
                    await Task.Delay(CalculateDealy(currentPercentage));
                }
            }
        };
        return resultContent;
    }

In order to delay the get request:

/// <summary>
    /// Delay the request to number of milliseconds
    /// </summary>
    /// <param name="currentPercentage"></param>
    /// <returns></returns>
    private int CalculateDealy(int currentPercentage)
    {
        int x = currentPercentage / 10;
        return (10 - x) * 1500;
    }

Get auth token:

/// <summary>
    /// Get OAuth token
    /// </summary>
    /// <returns></returns>
    public string GetOAuthToken()
    {
        return googleCredential.UnderlyingCredential.GetAccessTokenForRequestAsync("https://accounts.google.com/o/oauth2/v2/auth", CancellationToken.None).Result;
    }

At last, you will get the result like:

    {
  "name": "operationname here",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "progressPercent": 100,
    "startTime": "2018-12-18T10:56:09.425584Z",
    "lastUpdateTime": "2018-12-18T11:10:27.147310Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
    "results": [
      {
        "alternatives": [
          {
            "transcript": "okay let's get started",
            "confidence": 0.97442055
          }
        ]
      }, and so on .....

Things required:

  1. api-key.json file
  2. Install package Google.Apis.Auth.OAuth2 in order to authorize the HTTP web request

Thanks

这篇关于谷歌语音到 C# 中的文本 API的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆