NLTK使用自定义数据命名实体识别 [英] NLTK Named Entity Recognition with Custom Data
问题描述
我正在尝试使用NLTK从文本中提取命名实体.我发现NLTK NER的目的不是很准确,我也想添加一些自己的标签.我一直在尝试找到一种训练我自己的NER的方法,但是我似乎找不到合适的资源. 关于NLTK,我有几个问题
I'm trying to extract named entities from my text using NLTK. I find that NLTK NER is not very accurate for my purpose and I want to add some more tags of my own as well. I've been trying to find a way to train my own NER, but I don't seem to be able to find the right resources. I have a couple of questions regarding NLTK-
- 我可以使用自己的数据在NLTK中训练命名实体识别器吗?
- 如果我可以使用自己的数据进行训练,那么named_entity.py是要修改的文件吗?
- 输入文件格式是否必须在IOB中,例如.埃里克(Eric)NNP B-PERSON吗?
- 除了可以使用的nltk食谱和带有python的nlp之外,是否有其他资源?
在此方面,我将非常感谢帮助
I would really appreciate help in this regard
推荐答案
您是否决心使用NLTK/Python?我遇到了与您相同的问题,使用斯坦福的命名实体识别器会得到更好的结果: http://nlp.stanford.edu/software/CRF-NER.shtml .常见问题解答中很好地记录了使用您自己的数据训练分类器的过程.
Are you committed to using NLTK/Python? I ran into the same problems as you, and had much better results using Stanford's named-entity recognizer: http://nlp.stanford.edu/software/CRF-NER.shtml. The process for training the classifier using your own data is very well-documented in the FAQ.
如果您确实需要使用NLTK,我会在邮件列表中查找其他用户的一些建议:
If you really need to use NLTK, I'd hit up the mailing list for some advice from other users: http://groups.google.com/group/nltk-users.
希望这会有所帮助!
这篇关于NLTK使用自定义数据命名实体识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!