如何使用Lucene的新分析InfixSuggester API实现自动建议? [英] How to implement auto suggest using Lucene's new AnalyzingInfixSuggester API?

查看:123
本文介绍了如何使用Lucene的新分析InfixSuggester API实现自动建议?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Lucene的绿手,我想实现自动建议,就像谷歌一样,当我输入像'G'这样的字符时,它会给我一个列表,你可以尝试自己。

I am a greenhand on Lucene, and I want to implement auto suggest, just like google, when I input a character like 'G', it would give me a list, you can try your self.

我在整个网上搜索过。
没有人这样做,它为我们提供了一些新工具包建议

I have searched on the whole net. Nobody has done this , and it gives us some new tools in package suggest

但我需要一个例子来告诉我该怎么做

But i need an example to tell me how to do that

有没有人可以帮忙吗?

推荐答案

我会给你一个非常完整的例子,告诉你如何使用 AnalyzingInfixSuggester 。在这个例子中,我们假装我们是亚马逊,我们想要自动完成产品搜索字段。我们将利用Lucene建议系统的功能来实现以下功能:

I'll give you a pretty complete example that shows you how to use AnalyzingInfixSuggester. In this example we'll pretend that we're Amazon, and we want to autocomplete a product search field. We'll take advantage of features of the Lucene suggestion system to implement the following:


  1. 排名结果:我们将为您推荐最受欢迎的搭配产品第一。

  2. 地区限制结果:我们只会推荐我们在客户所在国家/地区销售的产品。

  3. 产品照片:我们将存储产品照片建议索引中的URL,以便我们可以在搜索结果中显示它们,而无需进行额外的数据库查找。

首先我是ll定义一个简单的类来保存Product.java中有关产品的信息:

First I'll define a simple class to hold information about a product in Product.java:

import java.util.Set;

class Product implements java.io.Serializable
{
    String name;
    String image;
    String[] regions;
    int numberSold;

    public Product(String name, String image, String[] regions,
                   int numberSold) {
        this.name = name;
        this.image = image;
        this.regions = regions;
        this.numberSold = numberSold;
    }
}

使用 AnalyzingInfixSuggester 的 build 方法需要传递一个实现 org.apache.lucene.search的对象。 suggest.InputIterator 界面。 InputIterator 可以访问密钥上下文有效负载权重每条记录。

To index records in with the AnalyzingInfixSuggester's build method you need to pass it an object that implements the org.apache.lucene.search.suggest.InputIterator interface. An InputIterator gives access to the key, contexts, payload and weight for each record.

是您实际要搜索并自动完成的文本。在我们的示例中,它将是产品的名称。

The key is the text you actually want to search on and autocomplete against. In our example, it will be the name of the product.

contexts 是一组可用于的其他任意数据过滤记录。在我们的示例中,上下文是我们将特定产品发送到的国家/地区的ISO代码集。

The contexts are a set of additional, arbitrary data that you can use to filter records against. In our example, the contexts are the set of ISO codes for the countries we will ship a particular product to.

有效负载是额外的任意值您要存储在记录索引中的数据。在此示例中,我们将实际序列化每个 Product 实例,并将生成的字节存储为有效负载。然后,当我们稍后进行查找时,我们可以反序列化有效负载并访问产品实例中的信息,如图像URL。

The payload is additional arbitrary data you want to store in the index for the record. In this example, we will actually serialize each Product instance and store the resulting bytes as the payload. Then when we later do lookups, we can deserialize the payload and access information in the product instance like the image URL.

使用 weight 订购建议结果;首先返回重量较高的结果。我们将使用给定产品的销售数量作为其权重。

The weight is used to order suggestion results; results with a higher weight are returned first. We'll use the number of sales for a given product as its weight.

以下是ProductIterator.java的内容:

Here's the contents of ProductIterator.java:

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.UnsupportedEncodingException;
import java.util.Comparator;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import org.apache.lucene.search.suggest.InputIterator;
import org.apache.lucene.util.BytesRef;


class ProductIterator implements InputIterator
{
    private Iterator<Product> productIterator;
    private Product currentProduct;

    ProductIterator(Iterator<Product> productIterator) {
        this.productIterator = productIterator;
    }

    public boolean hasContexts() {
        return true;
    }

    public boolean hasPayloads() {
        return true;
    }

    public Comparator<BytesRef> getComparator() {
        return null;
    }

    // This method needs to return the key for the record; this is the
    // text we'll be autocompleting against.
    public BytesRef next() {
        if (productIterator.hasNext()) {
            currentProduct = productIterator.next();
            try {
                return new BytesRef(currentProduct.name.getBytes("UTF8"));
            } catch (UnsupportedEncodingException e) {
                throw new Error("Couldn't convert to UTF-8");
            }
        } else {
            return null;
        }
    }

    // This method returns the payload for the record, which is
    // additional data that can be associated with a record and
    // returned when we do suggestion lookups.  In this example the
    // payload is a serialized Java object representing our product.
    public BytesRef payload() {
        try {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutputStream out = new ObjectOutputStream(bos);
            out.writeObject(currentProduct);
            out.close();
            return new BytesRef(bos.toByteArray());
        } catch (IOException e) {
            throw new Error("Well that's unfortunate.");
        }
    }

    // This method returns the contexts for the record, which we can
    // use to restrict suggestions.  In this example we use the
    // regions in which a product is sold.
    public Set<BytesRef> contexts() {
        try {
            Set<BytesRef> regions = new HashSet();
            for (String region : currentProduct.regions) {
                regions.add(new BytesRef(region.getBytes("UTF8")));
            }
            return regions;
        } catch (UnsupportedEncodingException e) {
            throw new Error("Couldn't convert to UTF-8");
        }
    }

    // This method helps us order our suggestions.  In this example we
    // use the number of products of this type that we've sold.
    public long weight() {
        return currentProduct.numberSold;
    }
}

在我们的驱动程序中,我们将执行以下操作:

In our driver program, we will do the following things:


  1. 在RAM中创建索引目录。

  2. 创建 StandardTokenizer

  3. 使用RAM目录和标记器创建 AnalyzingInfixSuggester

  4. 使用 ProductIterator 索引多个产品。

  5. 打印一些示例查找的结果。

  1. Create an index directory in RAM.
  2. Create a StandardTokenizer.
  3. Create an AnalyzingInfixSuggester using the RAM directory and tokenizer.
  4. Index a number of products using ProductIterator.
  5. Print the results of some sample lookups.

这是驱动程序,SuggestProducts.java:

Here's the driver program, SuggestProducts.java:

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester;
import org.apache.lucene.search.suggest.Lookup;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.Version;

public class SuggestProducts
{
    // Get suggestions given a prefix and a region.
    private static void lookup(AnalyzingInfixSuggester suggester, String name,
                               String region) {
        try {
            List<Lookup.LookupResult> results;
            HashSet<BytesRef> contexts = new HashSet<BytesRef>();
            contexts.add(new BytesRef(region.getBytes("UTF8")));
            // Do the actual lookup.  We ask for the top 2 results.
            results = suggester.lookup(name, contexts, 2, true, false);
            System.out.println("-- \"" + name + "\" (" + region + "):");
            for (Lookup.LookupResult result : results) {
                System.out.println(result.key);
                Product p = getProduct(result);
                if (p != null) {
                    System.out.println("  image: " + p.image);
                    System.out.println("  # sold: " + p.numberSold);
                }
            }
        } catch (IOException e) {
            System.err.println("Error");
        }
    }

    // Deserialize a Product from a LookupResult payload.
    private static Product getProduct(Lookup.LookupResult result)
    {
        try {
            BytesRef payload = result.payload;
            if (payload != null) {
                ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes);
                ObjectInputStream in = new ObjectInputStream(bis);
                Product p = (Product) in.readObject();
                return p;
            } else {
                return null;
            }
        } catch (IOException|ClassNotFoundException e) {
            throw new Error("Could not decode payload :(");
        }
    }

    public static void main(String[] args) {
        try {
            RAMDirectory index_dir = new RAMDirectory();
            StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48);
            AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(
                Version.LUCENE_48, index_dir, analyzer);

            // Create our list of products.
            ArrayList<Product> products = new ArrayList<Product>();
            products.add(
                new Product(
                    "Electric Guitar",
                    "http://images.example/electric-guitar.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Electric Train",
                    "http://images.example/train.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Acoustic Guitar",
                    "http://images.example/acoustic-guitar.jpg",
                    new String[]{"US", "ZA"},
                    80));
            products.add(
                new Product(
                    "Guarana Soda",
                    "http://images.example/soda.jpg",
                    new String[]{"ZA", "IE"},
                    130));

            // Index the products with the suggester.
            suggester.build(new ProductIterator(products.iterator()));

            // Do some example lookups.
            lookup(suggester, "Gu", "US");
            lookup(suggester, "Gu", "ZA");
            lookup(suggester, "Gui", "CA");
            lookup(suggester, "Electric guit", "US");
        } catch (IOException e) {
            System.err.println("Error!");
        }
    }
}

以下是来自的输出司机程序:

And here is the output from the driver program:

-- "Gu" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gu" (ZA):
Guarana Soda
  image: http://images.example/soda.jpg
  # sold: 130
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gui" (CA):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
-- "Electric guit" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100



< h1>附录

有一种方法可以避免编写一个你可能更容易找到的完整 InputIterator 。您可以编写一个存根 InputIterator ,它从 next null >, payload contexts 方法。将它的实例传递给 AnalyzingInfixSuggester build 方法:

Appendix

There's a way to avoid writing a full InputIterator that you might find easier. You can write a stub InputIterator that returns null from its next, payload and contexts methods. Pass an instance of it to AnalyzingInfixSuggester's build method:

suggester.build(new ProductIterator(new ArrayList<Product>().iterator()));

然后,对于您要编制索引的每个项目,请调用 AnalyzingInfixSuggester 添加 方法:

Then for each item you want to index, call the AnalyzingInfixSuggester add method:

suggester.add(text, contexts, weight, payload)

在为所有内容编制索引后,请致电刷新

After you've indexed everything, call refresh:

suggester.refresh();

如果要索引大量数据,可以使用多种方法显着加快索引编制速度线程:调用 build ,然后使用多个线程添加项目,最后调用 refresh

If you're indexing large amounts of data, it's possible to significantly speedup indexing using this method with multiple threads: Call build, then use multiple threads to add items, then finally call refresh.

这篇关于如何使用Lucene的新分析InfixSuggester API实现自动建议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆