在NLTK中使用babelize_shell()进行机器翻译 [英] Machine Translation using babelize_shell() in NLTK

查看:296
本文介绍了在NLTK中使用babelize_shell()进行机器翻译的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我正在学习使用NLTK的自然语言处理.我正在尝试实现本书的babelize_shell()示例.我正在执行的是执行babelize_shell(),然后输入字符串,然后输入书中所述的德语,然后输入run.

Hi I am learning Natural Language processing using NLTK. I am trying to implement babelize_shell() example of the book. What I am doing is executing babelize_shell(), after that I am entering my string, followed by german as stated in the book, followed by run.

我得到的错误是:

Traceback (most recent call last):
  File "<pyshell#148>", line 1, in <module>
    babelize_shell()
  File "C:\Python27\lib\site-packages\nltk\misc\babelfish.py", line 175, in babelize_shell
    for count, new_phrase in enumerate(babelize(phrase, 'english', language)):
  File "C:\Python27\lib\site-packages\nltk\misc\babelfish.py", line 126, in babelize
    phrase = translate(phrase, next, flip[next])
  File "C:\Python27\lib\site-packages\nltk\misc\babelfish.py", line 106, in translate
    if not match: raise BabelfishChangedError("Can't recognize translated string.")
BabelfishChangedError: Can't recognize translated string.

这是一个示例会话:

>>> babelize_shell()
NLTK Babelizer: type 'help' for a list of commands.
Babel> how long before the next flight to Alice Springs?
Babel> german
Babel> run
0> how long before the next flight to Alice Springs?
1> wie lang vor dem folgenden Flug zu Alice Springs?
2> how long before the following flight to Alice jump?
3> wie lang vor dem folgenden Flug zu Alice springen Sie?
4> how long before the following flight to Alice do you jump?
5> wie lang, bevor der folgende Flug zu Alice tun, Sie springen?
6> how long, before the following flight to Alice does, do you jump?
7> wie lang bevor der folgende Flug zu Alice tut, tun Sie springen?
8> how long before the following flight to Alice does, do you jump?
9> wie lang, bevor der folgende Flug zu Alice tut, tun Sie springen?
10> how long, before the following flight does to Alice, do do you jump?
11> wie lang bevor der folgende Flug zu Alice tut, Sie tun Sprung?
12> how long before the following flight does leap to Alice, does you?

推荐答案

我现在遇到相同的问题.

I'm having the same problem right now.

我发现了这一点: http://nltk.googlecode.com/svn/trunk/doc/api/nltk.misc.babelfish-module.html

它说: BabelfishChangedError 当babelfish.yahoo.com更改其HTML布局的某些详细信息时,抛出该异常,并且babelizer不再以正确的格式提交数据,或者不再解析结果.

and it says: BabelfishChangedError Thrown when babelfish.yahoo.com changes some detail of their HTML layout, and babelizer no longer submits data in the correct form, or can no longer parse the results.

我将看看是否有解决此问题的方法.

I'm going to see if there's a way to fix this.

我现在提出的解决方案使用Microsoft Translator Web服务(SOAP).这不是一个简单的解决方案,但是很有趣.

The solution I came out right now uses the Microsoft Translator web service (SOAP). It's not an easy solution, but funny to code.

我遵循了 http://msdn.microsoft.com/en-us/library/hh454950 ,然后修改nltk/misc/babelfish.py中找到的babelfish.py

I followed the instructions in http://msdn.microsoft.com/en-us/library/hh454950 and then modified the babelfish.py which is found in nltk/misc/babelfish.py

  1. 在Azure Marketplace上订阅Microsoft Translator API

在Azure Marketplace上订阅Microsoft Translator API,我选择了免费订阅.

Subscribe to the Microsoft Translator API on Azure Marketplace , I've choosen the free subscription.

  1. 注册您的应用程序Azure DataMarket

要在Azure DataMarket中注册您的应用程序,请使用步骤1中的LiveID凭据访问datamarket.azure.com/developer/applications/,然后单击注册".写下您的客户ID和客户机密供以后使用

To register your application with Azure DataMarket, visit datamarket.azure.com/developer/applications/ using the LiveID credentials from step 1, and click on "Register". Write down your client id and your client secret for later use

  1. 为Python安装suds fedorahosted.org/suds/

  1. Install suds for Python fedorahosted.org/suds/

修改babelfish.py(使用您自己的cliend_id和密码):

Modify the babelfish.py (use your own cliend_id and secret):

//导入要添加的

from suds.client import Client
import httplib
import ast

...

#added function
def soaped_babelfish(TextToTranslate,codeLangFrom, codeLangTo):

    #Oauth credentials
    params = urllib.urlencode({'client_id': 'babelfish_soaped', 'client_secret': '1IkIG3j0ujiSMkTueCZ46iAY4fB1Nzr+rHBciHDCdxw=', 'scope': 'http://api.microsofttranslator.com', 'grant_type': 'client_credentials'})


    headers = {"Content-type": "application/x-www-form-urlencoded"}
    conn = httplib.HTTPSConnection("datamarket.accesscontrol.windows.net")
    conn.request("POST", "/v2/OAuth2-13/", params, headers)
    response = conn.getresponse()
    #print response.status, response.reason

    data = response.read()


    #obtain access_token
    respondeDict = ast.literal_eval(data)
    access_token = respondeDict['access_token']
    conn.close()


    #use the webservice with the accesstoken
    client = Client('http://api.microsofttranslator.com/V2/Soap.svc')

    result = client.service.Translate('Bearer'+' '+access_token,TextToTranslate,codeLangFrom, codeLangTo, 'text/plain','general')

    return result

...

#modified translate method
def translate(phrase, source, target):
    phrase = clean(phrase)
    try:
        source_code = __languages[source]
        target_code = __languages[target]
    except KeyError, lang:
        raise ValueError, "Language %s not available " % lang

    return clean(soaped_babelfish(phrase,source_code,target_code))

这就是SOAPed版本的全部内容!另一天,我将尝试基于Web的解决方案(类似于当前的babelfish.py,但适应了更改)

And that's all for the SOAPed version! Some other day I'll try a web only based solution (similar to the current babelfish.py but adapted to the changes)

这篇关于在NLTK中使用babelize_shell()进行机器翻译的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆