Lucene实体提取 [英] Lucene Entity Extraction
问题描述
鉴于实体术语的有限词典,我正在寻找一种使用Lucene使用智能标记进行实体提取的方法.目前,我已经可以将Lucene用于以下用途:
-搜索带有模糊性的复杂短语
-突出显示结果
Given a finite dictionary of entity terms, I'm looking for a way to do Entity Extraction with intelligent tagging using Lucene. Currently I've been able to use Lucene for:
- Searching for complex phrases with some fuzzyness
- Highlighting results
但是,我不知道如何:
-获取匹配短语的准确偏移量
-在每次比赛中都进行特定于实体的注释(不只是每个匹配项的 标记)
However, I 'm not aware how to:
-Get accurate offsets of the matched phrases
-Do entity-specific annotaions per match(not just tags for every single hit)
我尝试使用了explain()方法-但这仅给出查询中获得匹配的字词-而不是原始文本中匹配的偏移量.
I have tried using the explain() method - but this only gives the terms in the query which got the hit - not the offsets of the hit within the original text.
有人遇到过类似的问题,愿意分享潜在的解决方案吗?
Has anybody faced a similar problem and is willing to share a potential solution?
在此先感谢您的帮助!
推荐答案
For the offset, see this question: How get the offset of term in Lucene?
我不太明白你的第二个问题.在我看来,您想要从存储的字段.要从存储的字段中获取数据,请执行以下操作:
I don't quite understand your second question. It sounds to me like you want to get the data from a stored field though. To get the data from a stored field:
TopDocs results = searcher.Search(query, filter, num);
foreach (ScoreDoc result in results.scoreDocs)
{
Document resultDoc = searcher.Doc(result.doc);
string valOfField = resultDoc.Get("My Field");
}
这篇关于Lucene实体提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!