如何在文本中定义人名(Java) [英] How to define person's names in text (Java)

查看:133
本文介绍了如何在文本中定义人名(Java)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些输入文本,其中包含一个或多个人的名字.我没有这些名字的字典.哪个Java库可以帮助我从输入文本中定义名称? 我浏览了OpenNLP,但没有找到任何示例或指南,或者至少没有描述如何将其应用到我的代码中. (我看过javadoc,但是对于这样的项目,它的文档非常差.)

I have some input text, which contains one or more human person names. I do not have any dictionary for these names. Which Java library can help me to define names from my input text? I looked through OpenNLP, but did not find any example or guide or at least description of how it can be applied into my code. (I saw javadoc, but it is pretty poor documentation for such a project.)

我想从一些随机文本中找到姓名.如果输入的文本是我的朋友乔·史密斯去了商店.",那么我想得到乔·史密斯".我认为,基于较小的词典,智能引擎上应该有足够大的词典,可以理解人名.

I want to find names from some random text. If the input text is "My friend Joe Smith went to the store.", then I want to get "Joe Smith". I think there should be some large enough dictionaries on smart engines, based on smaller dictionaries, that can understand human names.

推荐答案

OpenNLP具有命名实体识别.检查文档中的英文名称查找部分.但是我的经验表明,它可以识别实体,但没有与之关联的标签. (准确地说,我发现这些标签是不明确分配的.)因此,如果您有我的朋友乔·史密斯去沃尔玛商店"的句子,OpenNLP会标识两个命名实体-乔·史密斯"和沃尔玛".我无法将"Joe Smith"标记为Person,将"Walmart"标记为Organization.

OpenNLP has Named Entity recognition. Check the section English Name Finding in the docs. But my experience suggests, it identifies entities but there are no tags associated with it. (To be precise, I found the tags to ambiguously assigned.) So, if you have the sentence "My friend Joe Smith went to the Walmart store", OpenNLP identifies two named entities - "Joe Smith" and "Walmart". I couldn't get it tag "Joe Smith" as Person and "Walmart" as Organization.

根据Matt的建议,您可以尝试使用LingPipe,尽管它是一种商业工具.某些开源替代方案是 MorphAdorner

As suggested by Matt, you can try LingPipe, though it's a commercial tool. Some of the open source alternatives are MorphAdorner and Stanford NER.

这篇关于如何在文本中定义人名(Java)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆