无论大小写如何如何在NTLK swadesh语料库中翻译单词-python [英] How to translate words in NTLK swadesh corpus regardless of case - python

查看:106
本文介绍了无论大小写如何如何在NTLK swadesh语料库中翻译单词-python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python和自然语言处理的新手,我正在尝试使用nltk书籍进行学习.我将在第2章的末尾进行练习,我遇到了一个问题. 在比较单词列表的讨论中,我们创建了一个名为translate的对象,您可以使用德语和意大利语两个单词进行查找,以获得相应的英语单词.这种方法可能会出现什么问题?您能建议一种避免这种情况的方法吗?这个问题吗?"

I'm new to python and natural language processing, and I'm trying to learn using the nltk book. I'm doing the exercises at the end of Chapter 2, and there is a question I'm stuck on. "In the discussion of comparative wordlists, we created an object called translate which you could look up using words in both German and Italian in order to get corresponding words in English. What problem might arise with this approach? Can you suggest a way to avoid this problem?"

这本书让我使用swadesh语料库来创建翻译器",如下所示:

The book had me use the swadesh corpus to create a 'translator', as follows:

`from nltk.corpus import swadesh
fr2en = swadesh.entries(['fr', 'en'])
de2en = swadesh.entries(['de', 'en'])
es2en = swadesh.entries(['es', 'en'])
translate = dict(fr2en)
translate.update(dict(de2en))
translate.update(dict(es2en))`

我看到的一个问题是,当您将狗"(德语)的德语单词翻译为英语时,它仅采用大写形式: translate['Hund']返回'dog',而translate['hund']返回KeyError: 'hund'

One problem I saw was that when you translate the German word for dog (hund) to English, it only takes the uppercase form: translate['Hund'] returns 'dog', while translate['hund'] returns KeyError: 'hund'

有没有一种方法使翻译者无论大小写都能翻译单词?我一直在玩它,就像做translate.update(dict(de2en.lower))一样,但没有用.我觉得我缺少明显的东西.有人可以帮我吗?

Is there a way to make the translator translate words regardless of case? I've been playing around with it, like doing translate.update(dict(de2en.lower)) and what not to no avail. I feel like I'm missing something obvious. Could anyone help me?

谢谢!

推荐答案

啊,德语名词的大写字母(请参阅

Ah, the infamous capitalization of Nouns in German (see http://german.about.com/library/weekly/aa020919a.htm)

您可以尝试使用列表理解,并降低swadesh语料库中的每个标记:

You could try a list comprehension and lower each token from the swadesh corpus:

>>> from nltk.corpus import swadesh
>>> de2en = [(i.lower(),j.lower()) for i,j in swadesh.entries(['de','en'])]
>>> translate = dict(de2en)
>>> translate['hund']
u'dog'
>>> translate['Hund']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Hund'

但是您可能会丢失密钥中的大写字母.因此,为了解决此问题,您可以使用原始的swadesh条目再次更新translate字典:

But you would have lost the capitalization in the key. So to resolve that you can update the translate dictionary again with the original swadesh entries:

>>> from nltk.corpus import swadesh
>>> de2en = [(i.lower(),j.lower()) for i,j in swadesh.entries(['de','en'])]
>>> translate = dict(de2en)
>>> translate.update(swadesh.entries(['de','en']))
>>> translate['hund']
u'dog'
>>> translate['Hund']
u'dog'

这篇关于无论大小写如何如何在NTLK swadesh语料库中翻译单词-python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆