从 NLTK 格式分块斯坦福命名实体识别器 (NER) 输出 [英] Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

查看：25 发布时间：2022/1/2 17:31:16 python nlp nltk stanford-nlp named-entity-recognition

本文介绍了从 NLTK 格式分块斯坦福命名实体识别器 (NER) 输出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 NLTK 中使用 NER 来查找句子中的人物、地点和组织.我能够产生这样的结果:

I am using NER in NLTK to find persons, locations, and organizations in sentences. I am able to produce the results like this:

[(u'Remaking', u'O'), (u'The', u'O'), (u'Republican', u'ORGANIZATION'), (u'Party', u'ORGANIZATION')]

使用它可以将事物组合在一起吗?我想要的是这样的:

Is that possible to chunk things together by using it? What I want is like this:

u'Remaking'/ u'O', u'The'/u'O', (u'Republican', u'Party')/u'ORGANIZATION'

谢谢！

推荐答案

您可以使用标准的 NLTK 方式通过 nltk.Tree 来表示块.这可能意味着您必须稍微更改您的表示.

You can use the standard NLTK way of representing chunks using nltk.Tree. This might mean that you have to change your representation a bit.

我通常做的是将NER标记的句子表示为三元组列表:

What I usually do is represent NER-tagged sentences as lists of triplets:

sentence = [('Andrew', 'NNP', 'PERSON'), ('is', 'VBZ', 'O'), ('part', 'NN', 'O'), ('of', 'IN', 'O'), ('the', 'DT', 'O'), ('Republican', 'NNP', 'ORGANIZATION'), ('Party', 'NNP', 'ORGANIZATION'), ('in', 'IN', 'O'), ('Dallas', 'NNP', 'LOCATION')]

当我使用外部工具为句子标记 NER 时，我会这样做.现在您可以将这句话转换为 NLTK 表示:

I do this when I use an external tool for NER tagging a sentence. Now you can transform this sentence the NLTK representation:

from nltk import Tree


def IOB_to_tree(iob_tagged):
    root = Tree('S', [])
    for token in iob_tagged:
        if token[2] == 'O':
            root.append((token[0], token[1]))
        else:
            try:
                if root[-1].label() == token[2]:
                    root[-1].append((token[0], token[1]))
                else:
                    root.append(Tree(token[2], [(token[0], token[1])]))
            except:
                root.append(Tree(token[2], [(token[0], token[1])]))

    return root


sentence = [('Andrew', 'NNP', 'PERSON'), ('is', 'VBZ', 'O'), ('part', 'NN', 'O'), ('of', 'IN', 'O'), ('the', 'DT', 'O'), ('Republican', 'NNP', 'ORGANIZATION'), ('Party', 'NNP', 'ORGANIZATION'), ('in', 'IN', 'O'), ('Dallas', 'NNP', 'LOCATION')]
print IOB_to_tree(sentence)

表示形式的变化是有道理的，因为您当然需要 POS 标签来进行 NER 标记.

The change in representation kind of makes sense because you certainly need POS tags for NER tagging.

最终结果应该是这样的:

The end result should look like:

(S
  (PERSON Andrew/NNP)
  is/VBZ
  part/NN
  of/IN
  the/DT
  (ORGANIZATION Republican/NNP Party/NNP)
  in/IN
  (LOCATION Dallas/NNP))

这篇关于从 NLTK 格式分块斯坦福命名实体识别器 (NER) 输出的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从 NLTK 格式分块斯坦福命名实体识别器 (NER) 输出 [英] Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从 NLTK 格式分块斯坦福命名实体识别器 (NER) 输出 [英] Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭