Python Google Translate API错误:如何转换大量数据 [英] Python Google Translate API error : How to translate a large amount of data

查看:283
本文介绍了Python Google Translate API错误:如何转换大量数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题



我想对包含反向翻译数据集的NLP使用一种数据增强方法。



基本上,我有一个大型数据集( SNLI ),由1个组成100 000英语句子。我需要做的是:将这些句子翻译成一种语言,然后将其翻译回英语。



我可能必须用几种语言来做。所以我要做很多翻译。



我需要一个免费的解决方案。






到目前为止我做了什么



我尝试了几种python模块进行翻译,但是由于Google Translate API的最新更改,大多数都不起作用。如果我们应用此 googletrans 似乎可以正常工作/ 52456197/9494790>解决方案。



但是,它不适用于大型数据集。 Google的字符数上限为15,000个字符(如 this this )。第一个链接显示了应有的解决方法。






我被封锁的地方



即使我应用了变通方法(每次迭代都初始化翻译器),它也不起作用,我收到以下错误:

  json.decoder.JSONDecodeError:预期值:第1行第1列(字符0)

我尝试使用代理和其他Google翻译网址:

  URLS = ['translate.google.com','translate.google.co.kr','translate.google.ac','translate.google.ad','translate .google.ae',...] 

代理= {'http':'1.243.64.63:48730','https':'59 .11.98.253:42645',}

t = Translator(service_urls = URLS,代理=代理)

但是它没有任何改变。






注意



我的问题可能来自于我正在使用多线程:100名工作人员,负责翻译整个数据集。如果它们并行工作,也许它们一起使用超过15k个字符。



但是我应该使用多线程。如果我不这样做,将需要数周的时间来翻译整个数据集...






我的问题



如何解决此错误,以便翻译所有句子?



如果不是这样,

是否有免费的替代方法,可以使用Python获得机器翻译(不是必须使用Google Translate)?

解决方案

一百万个字符几乎可以翻译为文本。



当前,Google Cloud Translation V3提供了免费层配额(每月1-500,000个字符免费)。由于这似乎不足以满足您的用例,您可能需要创建多个账单帐户或等待一个月来翻译更多文本。



检查此链接以了解如何执行文本用python翻译。


My problem

I would like to use a kind of data-augmentation method for NLP consisting of back-translating dataset.

Basically, I have a large dataset (SNLI), consisting of 1 100 000 english sentences. What I need to do is : translate these sentences in a language, and translate it back to English.

I may have to do this for several language. So I have a lot of translations to do.

I need a free solution.


What I did so far

I tried several python module for translation, but due to recent changes in Google Translate API, most of them do not work. googletrans seems to work if we apply this solution.

However, it is not working for big dataset. There is a limit of 15K characters by Google (as pointed out by this, this and this). The first link show a supposed work-around.


Where I am blocked

Even if I apply the work-around (initializing the Translator every iteration), it is not working, and I got the following error :

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I tried using proxies and others Google translate URLs :

URLS = ['translate.google.com', 'translate.google.co.kr', 'translate.google.ac', 'translate.google.ad', 'translate.google.ae', ...]

proxies = {    'http': '1.243.64.63:48730',   'https': '59.11.98.253:42645', }

t = Translator(service_urls=URLS, proxies=proxies)

But it's not changing anything.


Note

My problem might come from the fact that I am using multi-threading : 100 workers for translating the whole dataset. If they work in parallel, maybe they use more than 15k characters together.

But I should use multi-threading. If I don't, it will take several weeks to translate the whole dataset...


My question

How do I fix this error so I can translate all sentences ?

If it's not possible, is there any free alternative, to get machine translation in Python (not mandatory to use Google Translate), for such a big dataset ?

解决方案

One million characters is pretty much text to be translated.

Currently, the Google Cloud Translation V3 offers a free tier quota that you may want to use (1-500,000 characters free per month). Since it doesn't seem to be enough for your use case, you probably need to create more than one billing accounts or wait for a month to translate more text.

Check this link to know how you can perform a text translation with python.

这篇关于Python Google Translate API错误:如何转换大量数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆