Google Cloud文字转语音介面混乱(如何下载mp3档案?) [英] Google Cloud Text-to-Speech Interface Confusion (How do I download the mp3 files?)

查看:163
本文介绍了Google Cloud文字转语音介面混乱(如何下载mp3档案?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以我不是程序员/开发人员-我是多媒体设计师的事实作为开头.我使用文字转语音来生成占位符音频文件,该文件可用于在录制正式音频旁白之前对动画进行计时.

I'd like to preface this with the fact that I am not a programmer/developer - I am a multimedia designer. I use text-to-speech to generate placeholder audio files that can be used to time animations before we record the official audio narration.

以前,我使用的是Amazon Polly,但我想尝试一下Google Cloud.但是,我最困难的时间实际上是在弄清楚如何生成mp3文件并保存它们.

Previously I was using Amazon Polly but I wanted to give Google Cloud a try. However, I'm having the hardest time actually figuring out how to generate the mp3 files and save them.

使用Amazon Polly,您只需访问网站,在字段中输入文本,然后单击一个按钮,它将文件另存为mp3文件.使用Google Cloud,似乎要复杂得多. 快速入门"指南使我能够启用API,下载JSON文件,设置环境凭据,初始化SDK以及在命令提示符下输入代码.

With Amazon Polly, you simply go to a website, enter your text into a field, and click a button and it will save your file as an mp3 file. With Google Cloud, it seems far more complicated than that. The "quick start" guide has me enabling APIs, downloading JSON files, setting environment credentials, initializing SDKs, and entering code into command prompt.

我在其文档页面上阅读的每本指南中的每一个似乎都不可避免地使我迈出了我根本不理解的一步.我不喜欢听起来像个完整的丑角,但这似乎让我有些头疼.我不是要创建软件或将机器学习集成到网站中,我只是想输入几行文字并生成mp3文件.

Every single one of the guides I've read on their documentation page seems to inevitably lead me to a step that I just simply don't understand. I hate to sound like a complete buffoon, but this seems to be a bit over my head. I'm not looking to create software or integrate machine learning into a website, I simply just want to enter a few lines of text and generate an mp3 file.

有什么办法可以使用Google Cloud?启动页面( https://cloud.google.com/text-to-speech/)恰好提供了我想要的东西,但是没有选择下载文件的方法,只是预览它们.

Is there any way to do that with Google Cloud? The launch page (https://cloud.google.com/text-to-speech/) offers exactly what I want, but there is no option to download the files, just preview them.

在此先感谢您为新手提供的任何帮助.

Thanks in advance for any help you can provide to this newbie.

推荐答案

Google的所有与ML相关的工具都具有非常差的一般用户"用户体验,并且是专门为程序设计而设计的.如果您只是在寻找一些具有合理用法的基本工具,那么目前可能还不是GCP.

All of Google's ML related tools have a pretty poor 'general user' user experience, and are designed very specifically for programatic usage. If you're just looking for some basic tools with a reasonable nice usage it's probably not GCP at the moment.

鉴于此,如果您愿意在开始时稍作努力,那么将样本转换成更多东西并不难.我建议使用此处描述的命令行.

Given that, the samples aren't that difficult to turn into something more if you're willing to struggle a little at the beginning. I'd suggest using the command line described here.

我将添加一些初始步骤. 1)下载并设置Gcloud SDK工具. 2)在终端中运行gcloud auth application-default login.这将打开浏览器,就像登录GCP控制台一样登录. 3)他们提供了一个通用文件的示例请求:

I'm going to add some initial steps. 1) Download and setup the Gcloud SDK tools. 2) In a terminal run gcloud auth application-default login. This will open a browser, log in like you would to the GCP Console. 3) They provided a sample request to general a file:

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
  -H "Content-Type: application/json; charset=utf-8" \
  --data "{
    'input':{
      'text':'Android is a mobile operating system developed by Google,
         based on the Linux kernel and designed primarily for
         touchscreen mobile devices such as smartphones and tablets.'
    },
    'voice':{
      'languageCode':'en-gb',
      'name':'en-GB-Standard-A',
      'ssmlGender':'FEMALE'
    },
    'audioConfig':{
      'audioEncoding':'MP3'
    }
  }" "https://texttospeech.googleapis.com/v1/text:synthesize" > synthesize-text.txt

这是我对不良体验的意思,代码https://texttospeech.googleapis.com/v1/text:synthesize" > synthesize-text.txt将文本的语音结果写入语音操作到synthesize-text.txt,而txt中是您的mp3文件.但是,等等,他们希望您以编程方式使用它,因此MP3不仅是直接文件,您可能还想对其进行其他操作,因此它以称为Base64的编码返回,这使得在HTTP上使用二进制数据更加容易(最常见的文字).因此,您得到的是json文件而不是mp3文件,例如:

This is what I meant about poor experience, the code https://texttospeech.googleapis.com/v1/text:synthesize" > synthesize-text.txt writes the results of the text to speech operation to synthesize-text.txt, and inside the txt is your mp3 file. But wait, they expect you to use it programatically so the MP3 isn't just a direct file, you might want to do something else with it so instead it's returned in an encoding called Base64, which makes it easier to use binary data over http(where text is most common). So instead of an mp3 you get a json file, like:

{"audioContent": "//NExAASCCIIAAhEAGAAEMW4kAYPnwwIKw/BBTpwTvB + IAxIfghUfW."}

{ "audioContent": "//NExAASCCIIAAhEAGAAEMW4kAYPnwwIKw/BBTpwTvB+IAxIfghUfW.." }

以//开头的文本就是您的音频.但是由于您是手动执行此操作,因此需要将引号内的所有内容(这是一个很长的字符串,以//...开头,//保留//字符)复制到一个新文件中,该文件名为,他们将其命名为synthesize-output-base64.txt.然后运行 base64 synthesize-output-base64.txt --decode > synthesized-audio.mp3

That text starting with // IS your audio. But because you're doing this manually you need to copy out everything inside the quotes (It'll be a really long string of text characters starting with //... keep the // characters) into a new file called whatever you want, they named it synthesize-output-base64.txt. Then run the base64 synthesize-output-base64.txt --decode > synthesized-audio.mp3

您已经完成了....原始请求可以让您指定文本,语音等.但是实际上,如果您正在寻找具有漂亮UI的随意文本转语音功能,那么GCP尚不存在.

And you're done.... the original request lets you specify the text, voice etc. But realistically if you're looking for casual text-to-speech with a pretty UI, GCP isn't there yet.

这篇关于Google Cloud文字转语音介面混乱(如何下载mp3档案?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆