如何在NLTK中调用ClassifierBasedTagger() [英] How to call the ClassifierBasedTagger() in NLTK

查看:106
本文介绍了如何在NLTK中调用ClassifierBasedTagger()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遵循了nltk书(第6章和第7章)中的文档以及其他构想,以训练我自己的模型来进行命名实体识别.构建功能函数和ClassifierBasedTagger之后,如下所示:

I have followed in the documentation from nltk book (chapter 6 and 7) and other ideas to train my own model for named entity recognition. After building a feature function and ClassifierBasedTagger like this:

class NamedEntityChunker(ChunkParserI):
    def __init__(self, train_sents, feature_detector=features, **kwargs):
        assert isinstance(train_sents, Iterable)
        tagged_sents = [[((w,t),c) for (w,t,c) in
                         tree2conlltags(sent)]
                        for sent in train_sents]

        #other possible option: self.feature_detector = features
        self.tagger = ClassifierBasedTagger(tagged_sents, feature_detector=feature_detector, **kwargs)

    def parse(self, tagged_sent):
        chunks = self.tagger.tag(tagged_sent)

        iob_triplets = [(w, t, c) for ((w, t), c) in chunks]

        # Transform the list of triplets to nltk.Tree format
        return conlltags2tree(iob_triplets)

当我从另一个脚本中加载分类和测试数据时,我会遇到问题.我使用训练数据中的一部分来调用分类器,以进行以下测试:

I am having problems when caling the classifiertagger from another script where I load my traning and test data. I call the classifier using a portion from my training data for testing purpose from:

chunker = NamedEntityChunker(training_samples[:500])

无论我在分类器中进行了什么更改,我都会不断收到错误消息:

No matter what I change in my classifier I keept getting the error:

   self.tagger = ClassifierBasedTagger(tagged_sents, feature_detector=feature_detector, **kwargs)
TypeError: __init__() got multiple values for argument 'feature_detector'

我在这里做错了什么,我猜想功能函数运行良好,并且在调用NamedEntityChunker()时不必传递其他任何内容.

What am I doing wrong here, I supossed the feature function is working fine and I don't have to pass anything else when calling NamedEntityChunker().

我的第二个问题,是否有办法保存正在训练的模型并重复使用,我该如何处理? 这是我关于训练数据的最后一个问题的的后续活动

my second question, is there a way to save the model being trained and reuse it lataer, how can I approach this? This is a follow up of my last question on training data

谢谢您的建议

推荐答案

我终于意识到自己所缺少的东西: 定义BasedTagger时,您必须为"tagged_sents"传递一个参数,如下所示:

I finally realised what I was missing: when defining BasedTagger you have to pass an argument for "tagged_sents", like this:

#self.tagger = ClassifierBasedTagger(train=train_sents, feature_detector=features, **kwargs) 

现在,当我调用分块程序NamedEntityChunker()时,一切正常.

now when I call the chunker NamedEntityChunker() everything is working.

这篇关于如何在NLTK中调用ClassifierBasedTagger()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆