如何将NLTK默认wordnet语言更改为zsm? [英] How to change NLTK default wordnet language to zsm?

查看:99
本文介绍了如何将NLTK默认wordnet语言更改为zsm?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是NLTK的新手,并且正在使用NLTK 3 Cookbook:第4章进行Python 3文本处理.我已经完成了使用WordNet进行标记",并且使用默认语言英语也可以正常工作.我已经将Language Bahasa(zsm)下载到omw,并希望使用其他数据集在Bahasa中进行尝试.使用相同的方法,如何现在将语言默认值从英语更改为zsm?

I'm new to NLTK and I'm doing the Python 3 Text Processing with NLTK 3 Cookbook: Chapter 4. I've done "Using WordNet for tagging" and works fine in default language English. I've download Language Bahasa (zsm) to omw and want to try in Bahasa using other datasets. Using the same approach, how can I change the language default from English to zsm now?

我正在使用的代码:

class WordNetTagger(SequentialBackoffTagger):

    def __init__(self, *args, **kwargs):
        SequentialBackoffTagger.__init__(self, *args, **kwargs)

        self.wordnet_tag_map = {
            'n': 'NN',
            's': 'JJ',
            'a': 'JJ',
            'r': 'RB',
            'v': 'VB'
        }

    def choose_tag(self, tokens, index, history):
        word = tokens[index]
        fd = FreqDist()

        for synset in wordnet.synsets(word):
            fd[synset.pos()] += 1

        if not fd: return None
        return self.wordnet_tag_map.get(fd.max())

提前谢谢.

推荐答案

您似乎已经发现,您无需更改默认语言;只要您不希望使用默认语言,就可以明确指定所需的语言.如果发现这很麻烦,则可以将wordnet对象包装在自己的自定义类中,该类提供其自己的默认值.

As you seem to have figured out, you don't change the default language; you explicitly specify the language you want, whenever you don't want the default. If you find this onerous, you could wrap the wordnet object in your own custom class that provides its own defaults.

class MyWordNet:
    def __init__(self, wn):
        self._wordnet = wn

    def synsets(self, word, pos=None, lang="zsm"):
        return self._wordnet.synsets(word, pos=pos, lang=lang)

    # and similarly for any other methods you need

然后,您初始化一个包装器对象,将nltk的wordnet阅读器对象传递给它,然后再使用它而不是原始对象:

Then you initialize a wrapper object, passing it the nltk's wordnet reader object, and later you use this instead of the original:

wn = MyWordNet(wordnet)
...

for synset it wn.synsets(word):
   ...

这篇关于如何将NLTK默认wordnet语言更改为zsm?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆