块状斯坦福命名实体识别器(NER)以NLTK格式输出 [英] Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

查看：99 发布时间：2020/5/18 0:33:20 python nlp nltk stanford-nlp named-entity-recognition

本文介绍了块状斯坦福命名实体识别器(NER)以NLTK格式输出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在NLTK中使用NER查找句子中的人物，位置和组织.我能够产生这样的结果:

I am using NER in NLTK to find persons, locations, and organizations in sentences. I am able to produce the results like this:

[(u'Remaking', u'O'), (u'The', u'O'), (u'Republican', u'ORGANIZATION'), (u'Party', u'ORGANIZATION')]

是否可以通过使用将它们打包在一起? 我想要的是这样的:

Is that possible to chunk things together by using it? What I want is like this:

u'Remaking'/ u'O', u'The'/u'O', (u'Republican', u'Party')/u'ORGANIZATION'

谢谢！

推荐答案

您可以使用标准的NLTK方式，通过 nltk.Tree 表示块.这可能意味着您必须稍微更改一下表示形式.

You can use the standard NLTK way of representing chunks using nltk.Tree. This might mean that you have to change your representation a bit.

我通常要做的是将 NER标记的句子表示为三元组列表:

What I usually do is represent NER-tagged sentences as lists of triplets:

sentence = [('Andrew', 'NNP', 'PERSON'), ('is', 'VBZ', 'O'), ('part', 'NN', 'O'), ('of', 'IN', 'O'), ('the', 'DT', 'O'), ('Republican', 'NNP', 'ORGANIZATION'), ('Party', 'NNP', 'ORGANIZATION'), ('in', 'IN', 'O'), ('Dallas', 'NNP', 'LOCATION')]

当我使用外部工具来为句子加标签时，我会这样做.现在，您可以将此句子转换为NLTK表示形式:

I do this when I use an external tool for NER tagging a sentence. Now you can transform this sentence the NLTK representation:

from nltk import Tree


def IOB_to_tree(iob_tagged):
    root = Tree('S', [])
    for token in iob_tagged:
        if token[2] == 'O':
            root.append((token[0], token[1]))
        else:
            try:
                if root[-1].label() == token[2]:
                    root[-1].append((token[0], token[1]))
                else:
                    root.append(Tree(token[2], [(token[0], token[1])]))
            except:
                root.append(Tree(token[2], [(token[0], token[1])]))

    return root


sentence = [('Andrew', 'NNP', 'PERSON'), ('is', 'VBZ', 'O'), ('part', 'NN', 'O'), ('of', 'IN', 'O'), ('the', 'DT', 'O'), ('Republican', 'NNP', 'ORGANIZATION'), ('Party', 'NNP', 'ORGANIZATION'), ('in', 'IN', 'O'), ('Dallas', 'NNP', 'LOCATION')]
print IOB_to_tree(sentence)

表示形式上的改变很有意义，因为您当然需要POS标签进行NER标签.

The change in representation kind of makes sense because you certainly need POS tags for NER tagging.

最终结果应类似于:

(S
  (PERSON Andrew/NNP)
  is/VBZ
  part/NN
  of/IN
  the/DT
  (ORGANIZATION Republican/NNP Party/NNP)
  in/IN
  (LOCATION Dallas/NNP))

这篇关于块状斯坦福命名实体识别器(NER)以NLTK格式输出的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

块状斯坦福命名实体识别器(NER)以NLTK格式输出 [英] Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

块状斯坦福命名实体识别器(NER)以NLTK格式输出 [英] Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭