在提供 Lucene 索引时使用免费工具进行实体提取/识别 [英] Entity Extraction/Recognition with free tools while feeding Lucene Index

查看：18 发布时间：2022/1/15 12:40:55 lucene nlp semantic-web mahout opennlp

本文介绍了在提供 Lucene 索引时使用免费工具进行实体提取/识别的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在研究从文本(网络上的很多文章)中提取人名、位置、技术词汇和类别的选项，然后将其输入 Lucene/ElasticSearch 索引.然后将附加信息添加为元数据，并应提高搜索的精度.

I'm currently investigating the options to extract person names, locations, tech words and categories from text (a lot articles from the web) which will then feeded into a Lucene/ElasticSearch index. The additional information is then added as metadata and should increase precision of the search.

例如当有人查询wicket"时，他应该能够确定他是指板球运动还是 Apache 项目.到目前为止，我试图自己实现这一点，但取得了轻微的成功.现在我找到了很多工具，但我不确定它们是否适合这项任务，以及它们中的哪些与 Lucene 集成得很好，或者实体提取的精度是否足够高.

E.g. when someone queries 'wicket' he should be able to decide whether he means the cricket sport or the Apache project. I tried to implement this on my own with minor success so far. Now I found a lot tools, but I'm not sure if they are suited for this task and which of them integrates good with Lucene or if precision of entity extraction is high enough.

Dbpedia Spotlight, the demo looks very promising
OpenNLP requires training. Which training data to use?
OpenNLP tools
Stanbol
NLTK
balie
UIMA
GATE -> example code
Apache Mahout
Stanford CRF-NER
maui-indexer
Mallet
Illinois Named Entity Tagger Not open source but free
wikipedianer data

我的问题:

有没有人使用上面列出的一些工具及其精度/召回率?或者，如果需要 + 可用的训练数据.
是否有文章或教程可以让我开始使用每个工具的实体提取 (NER)?
它们如何与 Lucene 集成?

以下是与该主题相关的一些问题:

Here are some questions related to that subject:

在提供 Lucene 索引时使用免费工具进行实体提取/识别 [英] Entity Extraction/Recognition with free tools while feeding Lucene Index

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在提供 Lucene 索引时使用免费工具进行实体提取/识别 [英] Entity Extraction/Recognition with free tools while feeding Lucene Index

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭