Lucene实体提取 [英] Lucene Entity Extraction

查看：64 发布时间：2020/5/4 7:40:31 lucene text-mining information-extraction lucene-highlighter

本文介绍了Lucene实体提取的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

鉴于实体术语的有限词典，我正在寻找一种使用Lucene使用智能标记进行实体提取的方法.目前，我已经可以将Lucene用于以下用途:
-搜索带有模糊性的复杂短语
-突出显示结果

Given a finite dictionary of entity terms, I'm looking for a way to do Entity Extraction with intelligent tagging using Lucene. Currently I've been able to use Lucene for:
- Searching for complex phrases with some fuzzyness
- Highlighting results

但是，我不知道如何:
-获取匹配短语的准确偏移量
-在每次比赛中都进行特定于实体的注释(不只是每个匹配项的标记)

However, I 'm not aware how to:
-Get accurate offsets of the matched phrases
-Do entity-specific annotaions per match(not just tags for every single hit)

我尝试使用了explain()方法-但这仅给出查询中获得匹配的字词-而不是原始文本中匹配的偏移量.

I have tried using the explain() method - but this only gives the terms in the query which got the hit - not the offsets of the hit within the original text.

有人遇到过类似的问题，愿意分享潜在的解决方案吗?

Has anybody faced a similar problem and is willing to share a potential solution?

在此先感谢您的帮助！

推荐答案

有关偏移量，请参阅以下问题:

For the offset, see this question: How get the offset of term in Lucene?

我不太明白你的第二个问题.在我看来，您想要从存储的字段.要从存储的字段中获取数据，请执行以下操作:

I don't quite understand your second question. It sounds to me like you want to get the data from a stored field though. To get the data from a stored field:

TopDocs results = searcher.Search(query, filter, num);
foreach (ScoreDoc result in results.scoreDocs)
{
    Document resultDoc = searcher.Doc(result.doc);
    string valOfField = resultDoc.Get("My Field");
}

这篇关于Lucene实体提取的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Lucene实体提取 [英] Lucene Entity Extraction

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Lucene实体提取 [英] Lucene Entity Extraction

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭