Lucene实体提取 [英] Lucene Entity Extraction

查看:64
本文介绍了Lucene实体提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于实体术语的有限词典,我正在寻找一种使用Lucene使用智能标记进行实体提取的方法.目前,我已经可以将Lucene用于以下用途:
-搜索带有模糊性的复杂短语
-突出显示结果

Given a finite dictionary of entity terms, I'm looking for a way to do Entity Extraction with intelligent tagging using Lucene. Currently I've been able to use Lucene for:
- Searching for complex phrases with some fuzzyness
- Highlighting results

但是,我不知道如何:
-获取匹配短语的准确偏移量
-在每次比赛中都进行特定于实体的注释(不只是每个匹配项的 标记)

However, I 'm not aware how to:
-Get accurate offsets of the matched phrases
-Do entity-specific annotaions per match(not just tags for every single hit)

我尝试使用了explain()方法-但这仅给出查询中获得匹配的字词-而不是原始文本中匹配的偏移量.

I have tried using the explain() method - but this only gives the terms in the query which got the hit - not the offsets of the hit within the original text.

有人遇到过类似的问题,愿意分享潜在的解决方案吗?

Has anybody faced a similar problem and is willing to share a potential solution?

在此先感谢您的帮助!

推荐答案

有关偏移量,请参阅以下问题:

For the offset, see this question: How get the offset of term in Lucene?

我不太明白你的第二个问题.在我看来,您想要从存储的字段.要从存储的字段中获取数据,请执行以下操作:

I don't quite understand your second question. It sounds to me like you want to get the data from a stored field though. To get the data from a stored field:

TopDocs results = searcher.Search(query, filter, num);
foreach (ScoreDoc result in results.scoreDocs)
{
    Document resultDoc = searcher.Doc(result.doc);
    string valOfField = resultDoc.Get("My Field");
}

这篇关于Lucene实体提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆