试图让Java读取使用Solr创建的Lucene索引 [英] Trying to get java to read lucene index created with solr

查看:80
本文介绍了试图让Java读取使用Solr创建的Lucene索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用solr创建的lucene索引. lucene版本是3.6.1.

I've got a lucene index that I created with solr. The lucene version is 3.6.1.

我在网上找到了一个读取Lucene索引的Java程序:

I found a java program on the web that reads a lucene index:

http://www.javacodegeeks. com/2010/05/introduction-to-apache-lucene-for-full.html

我为我的本地环境修改了该程序,但是它总是告诉我,在索引中没有找到结果的查询.在对程序不走运之后,我修改了代码以使用StandardAnalyzer而不是SimpleAnalyzer.没有运气.

I modified the program for my local environment but it always tells me that no hits are found for a query which has results in the index. After having no luck with the program I modified the code to use StandardAnalyzer instead of SimpleAnalyzer. No luck.

这是代码:

package com.javacodegeeks.lucene;

import java.io.File;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class StandardSearcher {

    public static void main(String[] args) throws Exception {

        File indexDir = new File("/path/to/solr/data/index/");
        String query = "science";
        int hits = 100;

        StandardSearcher searcher = new StandardSearcher();
        searcher.searchIndex(indexDir, query, hits);

    }

    private void searchIndex(File indexDir, String queryStr, int maxHits)
        throws Exception {

        StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);

        Directory directory = FSDirectory.open(indexDir);

        IndexSearcher searcher = new IndexSearcher(directory);
        Query query = new QueryParser(Version.LUCENE_36, "title", analyzer).parse(queryStr);

        TopDocs topDocs = searcher.search(query, maxHits);

        ScoreDoc[] hits = topDocs.scoreDocs;
        for (int i = 0; i < hits.length; i++) {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            System.out.println(d.get("filename"));
        }

        System.out.println("Found " + hits.length);

    }

}

我做错了什么?通过solrconfig.xml查看,我无法确定默认情况下哪个分析器solr使用.这就是为什么我同时尝试了SimpleAnalyzer和StandardAnalyzer.

What am I doing wrong? Looking through solrconfig.xml I can't tell which analyzer solr uses by default. That's why I tried both SimpleAnalyzer and StandardAnalyzer.

有关如何调试此问题的建议将不胜感激.

Suggestions on how to debug this would be greatly appreciated.

更新:这是我的架构中的字段:

Update: Here are the fields in my schema:

<field name="metaDataUrl" type="string" indexed="true" stored="true" required="true"/>
<field name="title" type="text" stored="true" indexed="true"/>
<field name="snippet" type="text" indexed="true" stored="true"/>
<field name="rest" type="string" stored="true" indexed="false" multiValued="true"/>
<field name="date_indexed" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
<field name="all" type="text" stored="false" indexed="true" multiValued="true"/>

而且,这是来自schema.xml的fieldType文本的XML:

And, here's the XML for fieldType text from schema.xml:

<!-- A text field that uses WordDelimiterFilter to enable splitting and matching of                                                                                                             
    words on case-change, alpha numeric boundaries, and non-alphanumeric chars,                                                                                                                 
    so that a query of "wifi" or "wi fi" could match a document containing "Wi-Fi".                                                                                                             
    Synonyms and stopwords are customized by external files, and stemming is enabled.                                                                                                           
    -->
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time                                                                                                                               
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>                                                                                  
    -->
    <!-- Case insensitive stop word removal.                                                                                                                                                    
      add enablePositionIncrements=true in both the index and query                                                                                                                             
      analyzers to leave a 'gap' for more accurate phrase queries.                                                                                                                              
    -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
</fieldType>

推荐答案

您需要使用标记器和建立索引时使用的过滤器(在fieldType xml的索引部分中定义)来构建自定义分析器.将该自定义分析器作为参数传递给搜索器,然后搜索应该可以正常工作. SnowballPorterFilter是否阻止了科学"?可能是..

You need to build your custom analyzer using the tokenizer and filters used while indexing (as defined in the index part of fieldType xml). Pass that custom analyzer as parameter to searcher and then search should work fine. Does SnowballPorterFilter stem "science"? may be ..

请参阅 http://whiteboardjunkie.wordpress.com/tag/custom-analyzer/,以获取有关构建自定义分析器的详细信息.您只需要在tokenstream()中一个接一个地调用过滤器

Refer to http://whiteboardjunkie.wordpress.com/tag/custom-analyzer/ for details on building your custom analyzer. You just need to call one filter after another in the tokenstream()

此外,您还可以使用luke(http://code.google.com/p/luke/)检查索引,并查看标题字段中是否有任何包含科学"的文档.

Also, you can examine the index using luke (http://code.google.com/p/luke/) and see if there are any documents containing "science" in title field at all.

这篇关于试图让Java读取使用Solr创建的Lucene索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆