查询Lucene令牌而无需编制索引 [英] Querying lucene tokens without indexing

查看:117
本文介绍了查询Lucene令牌而无需编制索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Lucene(或更具体地说是Compass)在论坛中记录话题,我需要一种方法来提取讨论背后的关键字.就是说,我不想索引某人所做的每个条目,但是我会列出与特定上下文相关的关键字"列表,并且如果该条目与关键字匹配并且超过我要添加的阈值这些条目添加到索引中.

I am using Lucene (or more specifically Compass), to log threads in a forum and I need a way to extract the keywords behind the discussion. That said, I don't want to index every entry someone makes, but rather I'd have a list of 'keywords' that are relevant to a certain context and if the entry matches a keyword and is above a threshold I'd add these entries to the index.

我希望能够使用分析器的功能来剥离事物并发挥其魔力,但随后从分析器返回令牌以匹配关键字,并计算某些单词正在出现的次数提到.

I want to be able to use the power of an analyser to strip out things and do its magic, but then return the tokens from the analyser in order to match the keywords, and also count the number of occurrences certain words are being mentioned.

有没有一种方法可以从分析器获取令牌,而无需为每个条目建立索引呢?

Is there a way to get the tokens from an analyser without having the overhead of indexing every entry made?

我当时想我必须维护一个RAMDirectory来保存 all 个条目,然后使用我的关键字列表执行搜索,然后将相关文档合并到持久性管理器中以实际存储相关内容项.

I was thinking I'd have to maintain a RAMDirectory to hold all entries, and then perform searches using my list of keywords, then merge the relevant Documents to the persistence manager to actually store the relevant entries.

推荐答案

您应该可以完全跳过RAMDirectory.您可以直接调用StandardAnalyzer并将其传回令牌列表(又称为关键字).

You should be able to skip using the RAMDirectory entirely. You can call the StandardAnalyzer directly and get it to pass back a list of tokens to you (aka keywords).

StandardAnalyzer analyzer = new StandardAnalyzer;
TokenStream stream = analyzer.tokenStream("meaningless", new StringReader("<text>"));
while (true) {
    Token token = stream.next();
    if (token == null) break;

    System.out.println(token.termText());
}

更好的是,编写自己的分析器(不难,请查看现有分析器的源代码),该分析器使用自己的过滤器来监视关键字.

Better yet, write your own Analyzer (they're not hard, have a look at the source code for the existing ones) that uses your own filter to watch for your keywords.

这篇关于查询Lucene令牌而无需编制索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆