如何在NLTK中调用ClassifierBasedTagger() [英] How to call the ClassifierBasedTagger() in NLTK
问题描述
我遵循了nltk书(第6章和第7章)中的文档以及其他构想,以训练我自己的模型来进行命名实体识别.构建功能函数和ClassifierBasedTagger之后,如下所示:
I have followed in the documentation from nltk book (chapter 6 and 7) and other ideas to train my own model for named entity recognition. After building a feature function and ClassifierBasedTagger like this:
class NamedEntityChunker(ChunkParserI):
def __init__(self, train_sents, feature_detector=features, **kwargs):
assert isinstance(train_sents, Iterable)
tagged_sents = [[((w,t),c) for (w,t,c) in
tree2conlltags(sent)]
for sent in train_sents]
#other possible option: self.feature_detector = features
self.tagger = ClassifierBasedTagger(tagged_sents, feature_detector=feature_detector, **kwargs)
def parse(self, tagged_sent):
chunks = self.tagger.tag(tagged_sent)
iob_triplets = [(w, t, c) for ((w, t), c) in chunks]
# Transform the list of triplets to nltk.Tree format
return conlltags2tree(iob_triplets)
当我从另一个脚本中加载分类和测试数据时,我会遇到问题.我使用训练数据中的一部分来调用分类器,以进行以下测试:
I am having problems when caling the classifiertagger from another script where I load my traning and test data. I call the classifier using a portion from my training data for testing purpose from:
chunker = NamedEntityChunker(training_samples[:500])
无论我在分类器中进行了什么更改,我都会不断收到错误消息:
No matter what I change in my classifier I keept getting the error:
self.tagger = ClassifierBasedTagger(tagged_sents, feature_detector=feature_detector, **kwargs)
TypeError: __init__() got multiple values for argument 'feature_detector'
我在这里做错了什么,我猜想功能函数运行良好,并且在调用NamedEntityChunker()时不必传递其他任何内容.
What am I doing wrong here, I supossed the feature function is working fine and I don't have to pass anything else when calling NamedEntityChunker().
我的第二个问题,是否有办法保存正在训练的模型并重复使用,我该如何处理? 这是我关于训练数据的最后一个问题的的后续活动
my second question, is there a way to save the model being trained and reuse it lataer, how can I approach this? This is a follow up of my last question on training data
谢谢您的建议
推荐答案
我终于意识到自己所缺少的东西: 定义BasedTagger时,您必须为"tagged_sents"传递一个参数,如下所示:
I finally realised what I was missing: when defining BasedTagger you have to pass an argument for "tagged_sents", like this:
#self.tagger = ClassifierBasedTagger(train=train_sents, feature_detector=features, **kwargs)
现在,当我调用分块程序NamedEntityChunker()时,一切正常.
now when I call the chunker NamedEntityChunker() everything is working.
这篇关于如何在NLTK中调用ClassifierBasedTagger()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!