使用命名实体注释将标签合并到我的文件中 [英] Merging tags into my file using named entity annotation

查看:28
本文介绍了使用命名实体注释将标签合并到我的文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在学习文本挖掘的基础知识时,我遇到了以下问题:我必须使用命名实体注释来查找和定位命名实体.但是,当找到时,该标签必须包含在文档中.例如:Hello I am Koen"必须导致Hello I am Koen .

While learning the basics of text mining i run into the following problem: I must use named entity annotation to find and locate named entities. However, when found, the tag must be included in the document. So for example: "Hello I am Koen" must result in "Hello I am < PERSON> Koen < /PERSON>.

我想出了如何查找和标记命名实体,但我一直坚持以正确的方式将它们放入文件中.我试过比较 ent.orth_ 是否在文件中,然后用标签 + ent.orth_ + 结束标签替换它.

I figured out how to find and label the named entities but I am stuck on getting them in the file in the right way. I've tried comparing if the ent.orth_ is in the file and then replace it with the tag + ent.orth_ + closing tag.

print([(X, X.ent_iob_, X.ent_type_) for X in doc])

我用它来定位实体的位置和开始的位置.

I used this for locating where the entities are and where they start.

for ent in doc.ents:
    entities.append(ent.orth_ + ", " + ent.label_)

我用它来创建一个带有原始表单和标签的变量.

I used this for creating a variable with both the original form and the label.

现在我拥有所有原始形式和标签的变量,并且知道实体的开始和结束位置.然而,当我试图以某种方式替换它时,我的知识不足,找不到任何类似的例子.

Right now i have the variable with all original forms and labels and know where the entities start and end. However when trying to replace it somehow my knowledge runs short and can't find any similar examples.

推荐答案

试试这个:

import spacy

nlp = spacy.load("en_core_web_sm")
s ="Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(s)

def replaceSubstring(s, replacement, position, length_of_replaced):
    s = s[:position] + replacement + s[position+length_of_replaced:]
    return(s)

for ent in reversed(doc.ents):
    #print(ent.text, ent.start_char, ent.end_char, ent.label_)
    replacement = "<{}>{}</{}>".format(ent.label_,ent.text, ent.label_)
    position = ent.start_char
    length_of_replaced = ent.end_char - ent.start_char 
    s = replaceSubstring(s, replacement, position, length_of_replaced)

print(s)
#<ORG>Apple</ORG> is looking at buying <GPE>U.K.</GPE> startup for <MONEY>$1 billion</MONEY>


这篇关于使用命名实体注释将标签合并到我的文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆