如何在Python中使用谷歌语音识别api? [英] How to use google speech recognition api in python?

查看:1426
本文介绍了如何在Python中使用谷歌语音识别api?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

堆栈溢出可能不是问这个问题的最佳位置,但我需要帮助。我有一个mp3文件,我想使用谷歌的语音识别来获取该文件的文本。任何想法,我可以找到文件或例子,将不胜感激。

/cloud.google.com/speech/\"> Google Cloud Speech API ,使开发人员能够将音频转换为文本[...] API可识别超过80种语言和变体[...]
You可以创建一个免费帐户来获取有限的API请求。



如何:



您首先需要安装 gcloud python模块& google-api-python-client 模块:

  pip install --upgrade gcloud 
pip install --upgrade google-api-python-client

然后在Cloud Platform Console中,转到项目页面并选择或创建新项目。在需要为项目启用结算功能后,请启用云语音API



启用Google Cloud Speech API后,请点击转至凭证按钮以设置您的Cloud Speech API凭据。

有关如何向云语音API服务授权的信息,请参阅设置服务帐户从你的代码中获得

你应该获得一个服务帐户密钥文件(以JSON格式)和一个GOOGLE_APPLICATION_CREDENTIALS环境变量,这将允许你对Speech API进行认证



全部完成后,下载音频原始文件来自Google以及 speech-discovery_google_rest_v1。 json from google



修改之前下载的JSON文件以设置您的凭证密钥
,然后确保您已将GOOGLE_APPLICATION_CREDENTIALS环境变量设置为.json文件的完整路径:
$ b

  export GOOGLE_APPLICATION_CREDENTIALS = / path / to / service_account_file.json 

code>

还有

确保您已将GCLOUD_PROJECT环境变量设置为您的Google Cloud项目的ID:

  export GCLOUD_PROJECT =您的项目ID 
tutorial.py 。其中包含:

  import argparse 
import base64
import json

from googleapiclient导入发现
从oauth2client.client导入httplib2
导入GoogleCredentials

$ b DISCOVERY_URL =('https:// {api} .googleapis.com / $ DIS covery / rest?'
'version = {apiVersion}')

$ b $ def get_speech_service():
credentials = GoogleCredentials.get_application_default()。create_scoped(
['https://www.googleapis.com/auth/cloud-platform'])
http = httplib2.Http()
credentials.authorize(http)

返回discovery.build(
'speech','v1beta1',http = http,discoveryServiceUrl = DISCOVERY_URL)


def main(speech_file):
转录给定的音频文件。

参数:
语音文件:音频文件的名称。

with open(speech_file,'rb')as speech:
speech_content = base64.b64encode(speech.read())

service = get_speech_service ()
service_request = service.speech()。syncrecognize(
body = {
'config':{$ b $'encoding':'LINEAR16',#raw 16-bit signed LE样本
'sampleRate':16000,#16 khz
'languageCode':'en-US',#a BCP-47语言标记
},
'audio': {
'content':speech_content.decode('UTF-8')
}
})
response = service_request.execute()
print(json.dumps (响应))

if __name__ =='__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'speech_file',help ='要识别的音频文件的完整路径')
args = parser.parse_args()
然后运行:

  python tutorial.py audio.raw 


Stack overflow might not be the best place to ask this question but i need help. I have an mp3 file and i want to use google's speech recognition to get the text out of that file. Any ideas where i can find documentation or examples will be appreciated.

解决方案

Take a look at Google Cloud Speech API that enables developers to convert audio to text [...] The API recognizes over 80 languages and variants [...] You can create a free account to get a limited amount of API request.

HOW TO:

You need first to install gcloud python module & google-api-python-client module with:

pip install --upgrade gcloud
pip install --upgrade google-api-python-client

Then in the Cloud Platform Console, go to the Projects page and select or create a new project. After you need to enable billing for your project, then enable Cloud Speech API.

After enabling the Google Cloud Speech API, click the Go to Credentials button to set up your Cloud Speech API credentials

See Set Up a Service Account for information on how to authorize to the Cloud Speech API service from your code

You should obtain both a service account key file (in JSON) and a GOOGLE_APPLICATION_CREDENTIALS environment variable that will allow you to authenticate to the Speech API

Once all done, download the audio raw file from google and also the speech-discovery_google_rest_v1.json from google

Modify previous downloaded JSON file to set your credentials key then make sure that you have set your the GOOGLE_APPLICATION_CREDENTIALS environment variable to the full path of the .json file with:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_file.json

also

Make sure that you have set your GCLOUD_PROJECT environment variable to the ID of your Google Cloud project with :

export GCLOUD_PROJECT=your-project-id

assuming all done, you can create a tutorial.py file which contain:

import argparse
import base64
import json

from googleapiclient import discovery
import httplib2
from oauth2client.client import GoogleCredentials


DISCOVERY_URL = ('https://{api}.googleapis.com/$discovery/rest?'
                 'version={apiVersion}')


def get_speech_service():
    credentials = GoogleCredentials.get_application_default().create_scoped(
        ['https://www.googleapis.com/auth/cloud-platform'])
    http = httplib2.Http()
    credentials.authorize(http)

    return discovery.build(
        'speech', 'v1beta1', http=http, discoveryServiceUrl=DISCOVERY_URL)


def main(speech_file):
    """Transcribe the given audio file.

    Args:
        speech_file: the name of the audio file.
    """
    with open(speech_file, 'rb') as speech:
        speech_content = base64.b64encode(speech.read())

    service = get_speech_service()
    service_request = service.speech().syncrecognize(
        body={
            'config': {
                'encoding': 'LINEAR16',  # raw 16-bit signed LE samples
                'sampleRate': 16000,  # 16 khz
                'languageCode': 'en-US',  # a BCP-47 language tag
            },
            'audio': {
                'content': speech_content.decode('UTF-8')
                }
            })
    response = service_request.execute()
    print(json.dumps(response))

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'speech_file', help='Full path of audio file to be recognized')
    args = parser.parse_args()
    main(args.speech_file)

Then run:

python tutorial.py audio.raw

这篇关于如何在Python中使用谷歌语音识别api?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆