如何实现使用Lucene的新AnalyzingInfixSuggester API自动提示? [英] How to implements auto suggest using Lucene's new AnalyzingInfixSuggester API?

查看:676
本文介绍了如何实现使用Lucene的新AnalyzingInfixSuggester API自动提示?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Lucene的一个greenhand,我想实现自动提示,就像谷歌,
当我输入像'G'一个字符,它会给我的列表,你可以试试你的自我。

i am a greenhand on Lucene, and i want to implement auto suggest,just like google, when i input a character like 'G', it would give me a list, you can try your self.

我有搜索的整个网络。
没有人做到了这一点,它使我们在包装建议

I Have search on the whole net. Nobody has done this , and it gives us some new tools in package suggest

但我需要一个例子来告诉我该怎么做。

But i need an example to tell me how to do that

是否有任何人能帮助?

推荐答案

我给你,告诉您如何使用 AnalyzingInfixSuggester A pretty完整的例子。在这个例子中,我们将pretend,我们是亚马逊,我们要自动完成一个产品搜索栏。我们将采取的Lucene的建议制度的特性来实现以下内容:

I'll give you a pretty complete example that shows you how to use AnalyzingInfixSuggester. In this example we'll pretend that we're Amazon, and we want to autocomplete a product search field. We'll take advantage of features of the Lucene suggestion system to implement the following:


  1. 中排名结果:我们会首先推荐最热门的配套产品

  2. 区域限制的结果:我们只会建议,我们在客户的国内销售产品

  3. 产品照片:我们将产品图片的URL存储在建议指标,所以我们可以在搜索结果中显示出来,而不必做额外的数据库查询

首先,我将定义一个简单的类来保存有关Product.java产品信息:

First I'll define a simple class to hold information about a product in Product.java:

import java.util.Set;

class Product implements java.io.Serializable
{
    String name;
    String image;
    String[] regions;
    int numberSold;

    public Product(String name, String image, String[] regions,
                   int numberSold) {
        this.name = name;
        this.image = image;
        this.regions = regions;
        this.numberSold = numberSold;
    }
}

要在索引记录的 AnalyzingInfixSuggester 建立方法,你需要通过它实现的对象 org.apache.lucene.search.su​​ggest.InputIterator 接口。一个 InputIterator的可以访问到的上下文的有效载荷的和的的每个记录。

To index records in with the AnalyzingInfixSuggester's build method you need to pass it an object that implements the org.apache.lucene.search.suggest.InputIterator interface. An InputIterator gives access to the key, contexts, payload and weight for each record.

的是你真的想反对,并自动完成搜索的文本。在我们的例子中,这将是该产品的名称

The key is the text you actually want to search on and autocomplete against. In our example, it will be the name of the product.

上下文的是一组可用于筛选记录对额外的,任意的数据。在我们的例子中,背景是设定ISO codeS为国家,我们将推出一个特别的产品。

The contexts are a set of additional, arbitrary data that you can use to filter records against. In our example, the contexts are the set of ISO codes for the countries we will ship a particular product to.

的有效载荷的是要在备案索引存储更多任意数据。在这个例子中,我们实际上将序列化每个产品实例并存储产生的字节作为有效载荷。然后,当我们后来做的查找,我们可以反序列化,如图像的URL的产品实例有效载荷和访问信息。

The payload is additional arbitrary data you want to store in the index for the record. In this example, we will actually serialize each Product instance and store the resulting bytes as the payload. Then when we later do lookups, we can deserialize the payload and access information in the product instance like the image URL.

重量的被用于排序结果的建议;权重较高的结果首先返回。我们将使用销售数量为给定的产品作为它的重量。

The weight is used to order suggestion results; results with a higher weight are returned first. We'll use the number of sales for a given product as its weight.

下面是ProductIterator.java的内容:

Here's the contents of ProductIterator.java:

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.UnsupportedEncodingException;
import java.util.Comparator;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import org.apache.lucene.search.suggest.InputIterator;
import org.apache.lucene.util.BytesRef;


class ProductIterator implements InputIterator
{
    private Iterator<Product> productIterator;
    private Product currentProduct;

    ProductIterator(Iterator<Product> productIterator) {
        this.productIterator = productIterator;
    }

    public boolean hasContexts() {
        return true;
    }

    public boolean hasPayloads() {
        return true;
    }

    public Comparator<BytesRef> getComparator() {
        return null;
    }

    // This method needs to return the key for the record; this is the
    // text we'll be autocompleting against.
    public BytesRef next() {
        if (productIterator.hasNext()) {
            currentProduct = productIterator.next();
            try {
                return new BytesRef(currentProduct.name.getBytes("UTF8"));
            } catch (UnsupportedEncodingException e) {
                throw new Error("Couldn't convert to UTF-8");
            }
        } else {
            return null;
        }
    }

    // This method returns the payload for the record, which is
    // additional data that can be associated with a record and
    // returned when we do suggestion lookups.  In this example the
    // payload is a serialized Java object representing our product.
    public BytesRef payload() {
        try {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutputStream out = new ObjectOutputStream(bos);
            out.writeObject(currentProduct);
            out.close();
            return new BytesRef(bos.toByteArray());
        } catch (IOException e) {
            throw new Error("Well that's unfortunate.");
        }
    }

    // This method returns the contexts for the record, which we can
    // use to restrict suggestions.  In this example we use the
    // regions in which a product is sold.
    public Set<BytesRef> contexts() {
        try {
            Set<BytesRef> regions = new HashSet();
            for (String region : currentProduct.regions) {
                regions.add(new BytesRef(region.getBytes("UTF8")));
            }
            return regions;
        } catch (UnsupportedEncodingException e) {
            throw new Error("Couldn't convert to UTF-8");
        }
    }

    // This method helps us order our suggestions.  In this example we
    // use the number of products of this type that we've sold.
    public long weight() {
        return currentProduct.numberSold;
    }
}

在我们的驱动程序,我们会做以下事情:

In our driver program, we will do the following things:


  1. 创建RAM中的索引目录。

  2. 创建一个 StandardTokenizer

  3. 创建一个 AnalyzingInfixSuggester 使用RAM目录和分词器。

  4. 索引一批使用 ProductIterator 产品。

  5. 打印一些样本查找的结果。

  1. Create an index directory in RAM.
  2. Create a StandardTokenizer.
  3. Create an AnalyzingInfixSuggester using the RAM directory and tokenizer.
  4. Index a number of products using ProductIterator.
  5. Print the results of some sample lookups.

这里的驱动程序,SuggestProducts.java:

Here's the driver program, SuggestProducts.java:

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester;
import org.apache.lucene.search.suggest.Lookup;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.Version;

public class SuggestProducts
{
    // Get suggestions given a prefix and a region.
    private static void lookup(AnalyzingInfixSuggester suggester, String name,
                               String region) {
        try {
            List<Lookup.LookupResult> results;
            HashSet<BytesRef> contexts = new HashSet<BytesRef>();
            contexts.add(new BytesRef(region.getBytes("UTF8")));
            // Do the actual lookup.  We ask for the top 2 results.
            results = suggester.lookup(name, contexts, 2, true, false);
            System.out.println("-- \"" + name + "\" (" + region + "):");
            for (Lookup.LookupResult result : results) {
                System.out.println(result.key);
                Product p = getProduct(result);
                if (p != null) {
                    System.out.println("  image: " + p.image);
                    System.out.println("  # sold: " + p.numberSold);
                }
            }
        } catch (IOException e) {
            System.err.println("Error");
        }
    }

    // Deserialize a Product from a LookupResult payload.
    private static Product getProduct(Lookup.LookupResult result)
    {
        try {
            BytesRef payload = result.payload;
            if (payload != null) {
                ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes);
                ObjectInputStream in = new ObjectInputStream(bis);
                Product p = (Product) in.readObject();
                return p;
            } else {
                return null;
            }
        } catch (IOException|ClassNotFoundException e) {
            throw new Error("Could not decode payload :(");
        }
    }

    public static void main(String[] args) {
        try {
            RAMDirectory index_dir = new RAMDirectory();
            StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48);
            AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(
                Version.LUCENE_48, index_dir, analyzer);

            // Create our list of products.
            ArrayList<Product> products = new ArrayList<Product>();
            products.add(
                new Product(
                    "Electric Guitar",
                    "http://images.example/electric-guitar.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Electric Train",
                    "http://images.example/train.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Acoustic Guitar",
                    "http://images.example/acoustic-guitar.jpg",
                    new String[]{"US", "ZA"},
                    80));
            products.add(
                new Product(
                    "Guarana Soda",
                    "http://images.example/soda.jpg",
                    new String[]{"ZA", "IE"},
                    130));

            // Index the products with the suggester.
            suggester.build(new ProductIterator(products.iterator()));

            // Do some example lookups.
            lookup(suggester, "Gu", "US");
            lookup(suggester, "Gu", "ZA");
            lookup(suggester, "Gui", "CA");
            lookup(suggester, "Electric guit", "US");
        } catch (IOException e) {
            System.err.println("Error!");
        }
    }
}

这是从驱动程序的输出:

And here is the output from the driver program:

-- "Gu" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gu" (ZA):
Guarana Soda
  image: http://images.example/soda.jpg
  # sold: 130
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gui" (CA):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
-- "Electric guit" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100

附录

还有以避免写一个完整的 InputIterator的,你可能会发现更轻松的方式。你可以写一个存根 InputIterator的的回报接下来载荷上下文方法。通过它的一个实例为 AnalyzingInfixSuggester 建立方法:

Appendix

There's a way to avoid writing a full InputIterator that you might find easier. You can write a stub InputIterator that returns null from its next, payload and contexts methods. Pass an instance of it to AnalyzingInfixSuggester's build method:

suggester.build(new ProductIterator(new ArrayList<Product>().iterator()));

那么对于要建立索引,请拨打 AnalyzingInfixSuggester <一个每个项目href=\"http://lucene.apache.org/core/4_8_0/suggest/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.html#add(org.apache.lucene.util.BytesRef,%20java.util.Set,%20long,%20org.apache.lucene.util.BytesRef)\"><$c$c>add方法:

suggester.add(text, contexts, weight, payload)

您已经收录的一切后,调用刷新

After you've indexed everything, call refresh:

suggester.refresh();

如果你正在索引大量数据,有可能用这种方法与多线程显著加速比索引:呼叫建立,然后使用多线程添加项目,然后最后调用刷新

If you're indexing large amounts of data, it's possible to significantly speedup indexing using this method with multiple threads: Call build, then use multiple threads to add items, then finally call refresh.

这篇关于如何实现使用Lucene的新AnalyzingInfixSuggester API自动提示?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆