如何在没有 IOB 标签的情况下使用 Hugging Face 的转换器管道重建文本实体? [英] How to reconstruct text entities with Hugging Face's transformers pipelines without IOB tags?

查看:15
本文介绍了如何在没有 IOB 标签的情况下使用 Hugging Face 的转换器管道重建文本实体?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找将 Hugging Face 的管道用于 NER(命名实体识别).但是,它以内-外-开始 (IOB) 格式返回实体标签,但 没有 IOB 标签.所以我无法将管道的输出映射回我的原始文本.此外,输出以 BERT 标记化格式进行屏蔽(默认模型为 BERT-large).

I've been looking to use Hugging Face's Pipelines for NER (named entity recognition). However, it is returning the entity labels in inside-outside-beginning (IOB) format but without the IOB labels. So I'm not able to map the output of the pipeline back to my original text. Moreover, the outputs are masked in BERT tokenization format (the default model is BERT-large).

例如:

from transformers import pipeline
nlp_bert_lg = pipeline('ner')
print(nlp_bert_lg('Hugging Face is a French company based in New York.'))

输出为:

[{'word': 'Hu', 'score': 0.9968873858451843, 'entity': 'I-ORG'},
{'word': '##gging', 'score': 0.9329522848129272, 'entity': 'I-ORG'},
{'word': 'Face', 'score': 0.9781811237335205, 'entity': 'I-ORG'},
{'word': 'French', 'score': 0.9981815814971924, 'entity': 'I-MISC'},
{'word': 'New', 'score': 0.9987512826919556, 'entity': 'I-LOC'},
{'word': 'York', 'score': 0.9976728558540344, 'entity': 'I-LOC'}]

如您所见,纽约分为两个标签.

As you can see, New York is broken up into two tags.

如何将 Hugging Face 的 NER 管道映射回我的原始文本?

How can I map Hugging Face's NER Pipeline back to my original text?

变形金刚版本:2.7

推荐答案

5 月 17 日,一个新的 pull request https://github.com/huggingface/transformers/pull/3957 与您要求的内容已合并,因此现在我们的生活更轻松,您可以在管道中使用它>

The 17th of May, a new pull request https://github.com/huggingface/transformers/pull/3957 with what you are asking for has been merged, therefore now our life is way easier, you can you it in the pipeline lik

ner = pipeline('ner', grouped_entities=True)

并且您的输出将如预期的那样.目前你必须从 master 分支安装,因为还没有新版本.你可以通过

and your output will be as expected. At the moment you have to install from the master branch since there is no new release yet. You can do it via

pip install git+git://github.com/huggingface/transformers.git@48c3a70b4eaedab1dd9ad49990cfaa4d6cb8f6a0

这篇关于如何在没有 IOB 标签的情况下使用 Hugging Face 的转换器管道重建文本实体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆