如何在命名实体识别中解析同义词? [英] How can one resolve synonyms in named-entity recognition?

查看:179
本文介绍了如何在命名实体识别中解析同义词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在自然语言处理中,命名实体识别是识别命名实体(例如组织,地点以及最重要的名称)的挑战.

In natural language processing, named-entity recognition is the challenge of, well, recognizing named entities such as organizations, places, and most importantly names.

尽管我称其为同义词,但这是一个重大挑战: Count Dracula 实际上是指同一个词.人,但有可能永远不会在本文中直接讨论.

There is a major challenge in this though that I call that of synonymy: The Count and Dracula are in fact referring to the same person, but it it possible that this is never discussed directly in the text.

解析这些同义词的最佳算法是什么?

What would be the best algorithm to resolve these synonyms?

如果任何基于Python的库中都有此功能,我很想接受教育.我正在使用NLTK.

If there is a feature for this in any Python-based library, I'm eager to be educated. I'm using NLTK.

推荐答案

您正在描述共同引用解析和命名实体链接.我提供单独的链接,因为我不确定您是指哪个链接.

You are describing a problem of coreference resolution and named entity linking. I'm providing separate links as I am not entirely sure which one you meant.

  • 共同引用: Stanford CoreNLP 目前具有最佳的实现方式之一,但位于Java.我使用了 python绑定,但我不太满意-我最终运行了我所有的数据都只通过斯坦福管道一次,然后以python加载处理过的XML文件.显然,如果必须实时处理,那是行不通的.
  • 命名实体链接:查看 Apache Stanbol 和以下 Stackoverflow帖子.
  • Coreference: Stanford CoreNLP currently has one of the best implementations, but is in Java. I have used the python bindings and I wasn't too happy- I ended up running all my data through the Stanford pipeline just once, and then loading the processed XML files in python. Obviously, that doesn't work if you have to be processing in real time.
  • Named entity linking: Check out Apache Stanbol and the links in the following Stackoverflow post.

这篇关于如何在命名实体识别中解析同义词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆