带有 python NLTK 的斯坦福 NER 因包含多个“!!"的字符串而失败? [英] Stanford NER with python NLTK fails with strings containing multiple &quot;!!&quot;s?

查看：87 发布时间：2021/6/7 20:44:16 python nltk stanford-nlp named-entity-recognition

本文介绍了带有 python NLTK 的斯坦福 NER 因包含多个“!!"的字符串而失败?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设这是我的filecontent:

当他们超过 45 岁时！！这肯定会帮助迈克尔·乔丹.

When they are over 45 years old!! It would definitely help Michael Jordan.

以下是我标记 setence 的代码.

Below is my code for tagging setences.

st = NERTagger('stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz', 'stanford-ner/stanford-ner.jar')
tokenized_sents = [word_tokenize(sent) for sent in sent_tokenize(filecontent)]  
taggedsents = st.tag_sents(tokenized_sents)

我希望 tokenized_sents 和 taggedsents 包含相同数量的句子.

I would expect both tokenized_sents and taggedsents contain the same number of sentences.

但这是它们包含的内容:

But here is what they contain:

for ts in tokenized_sents:
    print "tok   ", ts

for ts in taggedsents:
    print "tagged    ",ts

>> tok    ['When', 'they', 'are', 'over', '45', 'years', 'old', '!', '!']
>> tok    ['It', 'would', 'definitely', 'help', '.']
>> tagged     [(u'When', u'O'), (u'they', u'O'), (u'are', u'O'), (u'over', u'O'), (u'45', u'O'), (u'years', u'O'), (u'old', u'O'), (u'!', u'O')]
>> tagged     [(u'!', u'O')]
>> tagged     [(u'It', u'O'), (u'would', u'O'), (u'definitely', u'O'), (u'help', u'O'), (u'Michael', u'PERSON'), (u'Jordan', u'PERSON'), (u'.', u'O')]

这是因为有双！"在假定的第一句话的末尾.在使用 st.tag_sents()

This is due to having doulbe "!" at the end of the supposed first sentence. Do I have to remove double "!"s before using st.tag_sents()

我应该如何解决这个问题?

How should I resolve this?

带有 python NLTK 的斯坦福 NER 因包含多个“!!"的字符串而失败? [英] Stanford NER with python NLTK fails with strings containing multiple &quot;!!&quot;s?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

带有 python NLTK 的斯坦福 NER 因包含多个“!!"的字符串而失败? [英] Stanford NER with python NLTK fails with strings containing multiple &amp;quot;!!&amp;quot;s?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

带有 python NLTK 的斯坦福 NER 因包含多个“!!"的字符串而失败? [英] Stanford NER with python NLTK fails with strings containing multiple "!!"s?

登录关闭