使用Lucene查找与查询匹配的总数 [英] Finding total number of matches to a query with lucene

查看:551
本文介绍了使用Lucene查找与查询匹配的总数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Lucene的新手,所以我不知道是否有可能,但是我有一个索引,我想获取索引子集中的词组总数(该子集由过滤器定义). 我可以将FilteredQuery与我的Filter和PhraseQuery一起使用来搜索短语,因此我可以对出现该短语的文档进行计数,但是我似乎也找不到一种方法来对每个文档的匹配数进行计数. /p>

I'm new to lucene so I don't know if it is possible, but I have an index and I would like to get the total amount of phrases in a subset of the index(the subset is defined by a filter). I can use FilteredQuery with my Filter and a PhraseQuery to search for the phrase and thus I can count the documents in which this phrase occurs, but I can't seem to find a way to count the number of matches per document as well.

推荐答案

您可以执行此操作,请参见

You can do this, see LUCENE-2590 for details.

例如,您可以查看我已在下面为短语搜索者复制了相关代码,

I've copied the relevant code for phrase searchers below,

这是收藏家,

private static class CountingCollector extends Collector {
  private final Collector other;
  private int docBase;

  public final Map<Integer, Map<Query, Float>> docCounts = new HashMap<Integer, Map<Query, Float>>();

  private final Map<Query, Scorer> subScorers = new HashMap<Query, Scorer>();
  private final ScorerVisitor<Query, Query, Scorer> visitor = new MockScorerVisitor();
  private final EnumSet<Occur> collect;

  private class MockScorerVisitor extends ScorerVisitor<Query, Query, Scorer> {

    @Override
    public void visitOptional(Query parent, Query child, Scorer scorer) {
      if (collect.contains(Occur.SHOULD))
        subScorers.put(child, scorer);
    }

    @Override
    public void visitProhibited(Query parent, Query child, Scorer scorer) {
      if (collect.contains(Occur.MUST_NOT))
        subScorers.put(child, scorer);
    }

    @Override
    public void visitRequired(Query parent, Query child, Scorer scorer) {
      if (collect.contains(Occur.MUST))
        subScorers.put(child, scorer);
    }

  }

  public CountingCollector(Collector other) {
    this(other, EnumSet.allOf(Occur.class));
  }

  public CountingCollector(Collector other, EnumSet<Occur> collect) {
    this.other = other;
    this.collect = collect;
  }

  @Override
  public void setScorer(Scorer scorer) throws IOException {
    other.setScorer(scorer);
    scorer.visitScorers(visitor);
  }

  @Override
  public void collect(int doc) throws IOException {
    final Map<Query, Float> freqs = new HashMap<Query, Float>();
    for (Map.Entry<Query, Scorer> ent : subScorers.entrySet()) {
      Scorer value = ent.getValue();
      int matchId = value.docID();
      freqs.put(ent.getKey(), matchId == doc ? value.freq() : 0.0f);
    }
    docCounts.put(doc + docBase, freqs);
    other.collect(doc);
  }

  @Override
  public void setNextReader(IndexReader reader, int docBase)
      throws IOException {
    this.docBase = docBase;
    other.setNextReader(reader, docBase);
  }

  @Override
  public boolean acceptsDocsOutOfOrder() {
    return other.acceptsDocsOutOfOrder();
  }
}

单元测试为

@Test
public void testPhraseQuery() throws Exception {
  PhraseQuery q = new PhraseQuery();
  q.add(new Term("f", "b"));
  q.add(new Term("f", "c"));
  CountingCollector c = new CountingCollector(TopScoreDocCollector.create(10,
      true));
  s.search(q, null, c);
  final int maxDocs = s.maxDoc();
  assertEquals(maxDocs, c.docCounts.size());
  for (int i = 0; i < maxDocs; i++) {
    Map<Query, Float> doc0 = c.docCounts.get(i);
    assertEquals(1, doc0.size());
    assertEquals(2.0F, doc0.get(q), FLOAT_TOLERANCE);

    Map<Query, Float> doc1 = c.docCounts.get(++i);
    assertEquals(1, doc1.size());
    assertEquals(1.0F, doc1.get(q), FLOAT_TOLERANCE);
  }

}

这篇关于使用Lucene查找与查询匹配的总数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆