Solr / Lucene:获取按索引中出现次数排序的所有字段名称 [英] Solr / Lucene: Get all field names sorted by number of occurrences in index

查看:435
本文介绍了Solr / Lucene:获取按索引中出现次数排序的所有字段名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想得到所有字段的列表(即字段名称),按照它们在Solr索引中出现的次数排序,即:最常出现的字段,第二个最常出现的字段和等等。

I want to get the list of all fields (i.e. field names) sorted by the number of times they occur in the Solr index, i.e.: most frequently occurring field, second most frequently occurring field and so on.

或者,获取索引中的所有字段及其出现的次数也就足够了。

Alternatively, getting all fields in the index and the number of times they occur would also be sufficient.

如何使用单个solr查询或通过solr / lucene java API完成此操作?

How do I accomplish this either with a single solr query or through solr/lucene java API?

字段集不固定,范围为数百。几乎所有字段都是动态的,除了id和可能还有更多。

The set of fields is not fixed and ranges in the hundreds. Almost all fields are dynamic, except for id and perhaps a couple more.

推荐答案

Solr:从solr索引中检索字段名称?您可以使用LukeRequesthandler执行此操作。

As stated in Solr: Retrieve field names from a solr index? you can do this by using the LukeRequesthandler.

为此,您需要在 solrconfig.xml

<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />

并将其称为

http://solr:8983/solr/admin/luke?numTerms=0

如果您希望按照某些内容对字段进行排序,则需要自行执行此操作。如果您在java环境中,我建议使用Solrj。

If you want to get the fields sorted by something you are required to do this on your own. I would suggest to use Solrj in case you are in a java environment.

使用Solrj获取字段

Fetch fields using Solrj

@Test
public void lukeRequest() throws SolrServerException, IOException {
  SolrServer solrServer = new HttpSolrServer("http://solr:8983/solr");

  LukeRequest lukeRequest = new LukeRequest();
  lukeRequest.setNumTerms(1);
  LukeResponse lukeResponse = lukeRequest.process(solrServer );

  List<FieldInfo> sorted = new ArrayList<FieldInfo>(lukeResponse.getFieldInfo().values());
  Collections.sort(sorted, new FieldInfoComparator());
  for (FieldInfo infoEntry : sorted) {
    System.out.println("name: " + infoEntry.getName());
    System.out.println("docs: " + infoEntry.getDocs());
  }
}

示例中使用的比较器

public class FieldInfoComparator implements Comparator<FieldInfo> {
  @Override
  public int compare(FieldInfo fieldInfo1, FieldInfo fieldInfo2) {
    if (fieldInfo1.getDocs() > fieldInfo2.getDocs()) {
      return -1;
    }
    if (fieldInfo1.getDocs() < fieldInfo2.getDocs()) {
      return 1;
    }
    return fieldInfo1.getName().compareTo(fieldInfo2.getName());
  }
}

这篇关于Solr / Lucene:获取按索引中出现次数排序的所有字段名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆