在提供Lucene Index时使用免费工具进行实体提取/识别 [英] Entity Extraction/Recognition with free tools while feeding Lucene Index

查看：100 发布时间：2020/5/4 7:26:46 lucene nlp semantic-web mahout opennlp

本文介绍了在提供Lucene Index时使用免费工具进行实体提取/识别的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在研究从文本(网络上的很多文章)中提取人物姓名，位置，技术用语和类别的选项，然后将其输入到Lucene/ElasticSearch索引中.然后，附加信息将作为元数据添加，并应提高搜索的准确性.

I'm currently investigating the options to extract person names, locations, tech words and categories from text (a lot articles from the web) which will then feeded into a Lucene/ElasticSearch index. The additional information is then added as metadata and should increase precision of the search.

例如当有人询问检票口"时，他应该能够决定他是指板球运动还是阿帕奇项目.到目前为止，我尝试自己实施此方法，但收效甚微.现在，我发现了很多工具，但是我不确定它们是否适合此任务，哪些与Lucene集成得很好，或者实体提取的精度是否足够高.

E.g. when someone queries 'wicket' he should be able to decide whether he means the cricket sport or the Apache project. I tried to implement this on my own with minor success so far. Now I found a lot tools, but I'm not sure if they are suited for this task and which of them integrates good with Lucene or if precision of entity extraction is high enough.

Dbpedia Spotlight ， OpenNLP 需要 OpenNLP工具
Stanbol
NLTK
balie
UIMA
门-> Apache Mahout
斯坦福CRF-NER
maui-indexer
Mallet
伊利诺伊州命名实体标记符不是开源的，而是免费的
Wikipedianer数据

Dbpedia Spotlight, the demo looks very promising
OpenNLP requires training. Which training data to use?
OpenNLP tools
Stanbol
NLTK
balie
UIMA
GATE -> example code
Apache Mahout
Stanford CRF-NER
maui-indexer
Mallet
Illinois Named Entity Tagger Not open source but free
wikipedianer data

我的问题:

是否有人对上面列出的某些工具及其精度/召回率有经验?或者，如果需要培训数据+可用.
是否有文章或教程可让您开始使用每种工具的实体提取(NER)?
如何将它们与Lucene集成?

以下是与该主题相关的一些问题:

Here are some questions related to that subject:

Does an algorithm exist to help detect the "primary topic" of an English sentence?
Named Entity Recognition Libraries for Java
Named entity recognition with Java

在提供Lucene Index时使用免费工具进行实体提取/识别 [英] Entity Extraction/Recognition with free tools while feeding Lucene Index

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在提供Lucene Index时使用免费工具进行实体提取/识别 [英] Entity Extraction/Recognition with free tools while feeding Lucene Index

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭