如何使用 Lucene 的新 AnalyzingInfixSuggester API 实现自动建议? [英] How to implement auto suggest using Lucene's new AnalyzingInfixSuggester API?

查看:29
本文介绍了如何使用 Lucene 的新 AnalyzingInfixSuggester API 实现自动建议?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Lucene的新手,我想实现自动提示,就像google一样,当我输入'G'这样的字符时,它会给我一个列表,你可以自己尝试.

I am a greenhand on Lucene, and I want to implement auto suggest, just like google, when I input a character like 'G', it would give me a list, you can try your self.

我已经在整个网络上搜索过.没有人这样做过,它在 suggest 包中为我们提供了一些新工具

I have searched on the whole net. Nobody has done this , and it gives us some new tools in package suggest

但我需要一个例子来告诉我怎么做

But i need an example to tell me how to do that

有人可以帮忙吗?

推荐答案

我会给你一个非常完整的例子,向你展示如何使用 AnalyzingInfixSuggester.在这个例子中,我们假设我们是亚马逊,我们想要自动完成一个产品搜索字段.我们将利用 Lucene 建议系统的功能来实现以下功能:

I'll give you a pretty complete example that shows you how to use AnalyzingInfixSuggester. In this example we'll pretend that we're Amazon, and we want to autocomplete a product search field. We'll take advantage of features of the Lucene suggestion system to implement the following:

  1. 排名结果:我们会首先推荐最受欢迎的匹配产品.
  2. 受区域限制的结果:我们只会推荐我们在客户所在国家/地区销售的产品.
  3. 产品照片:我们将产品照片 URL 存储在建议索引中,以便我们可以在搜索结果中显示它们,而无需进行额外的数据库查找.

首先,我将在 Product.java 中定义一个简单的类来保存有关产品的信息:

First I'll define a simple class to hold information about a product in Product.java:

import java.util.Set;

class Product implements java.io.Serializable
{
    String name;
    String image;
    String[] regions;
    int numberSold;

    public Product(String name, String image, String[] regions,
                   int numberSold) {
        this.name = name;
        this.image = image;
        this.regions = regions;
        this.numberSold = numberSold;
    }
}

要使用 AnalyzingInfixSuggesterbuild 方法索引记录,您需要向它传递一个实现 org.apache.lucene.search.suggest 的对象.InputIterator 接口.InputIterator 允许访问每个的 keycontextspayloadweight记录.

To index records in with the AnalyzingInfixSuggester's build method you need to pass it an object that implements the org.apache.lucene.search.suggest.InputIterator interface. An InputIterator gives access to the key, contexts, payload and weight for each record.

key 是您实际想要搜索和自动完成的文本.在我们的示例中,它将是产品的名称.

The key is the text you actually want to search on and autocomplete against. In our example, it will be the name of the product.

上下文 是一组附加的、任意的数据,可用于过滤记录.在我们的示例中,上下文是我们将特定产品运送到的国家/地区的一组 ISO 代码.

The contexts are a set of additional, arbitrary data that you can use to filter records against. In our example, the contexts are the set of ISO codes for the countries we will ship a particular product to.

payload 是您要存储在索引中的附加任意数据以作为记录.在此示例中,我们将实际序列化每个 Product 实例并将结果字节存储为有效负载.然后,当我们稍后进行查找时,我们可以反序列化有效负载并访问产品实例中的信息,例如图像 URL.

The payload is additional arbitrary data you want to store in the index for the record. In this example, we will actually serialize each Product instance and store the resulting bytes as the payload. Then when we later do lookups, we can deserialize the payload and access information in the product instance like the image URL.

权重用于对建议结果进行排序;首先返回权重较高的结果.我们将使用给定产品的销售数量作为权重.

The weight is used to order suggestion results; results with a higher weight are returned first. We'll use the number of sales for a given product as its weight.

以下是 ProductIterator.java 的内容:

Here's the contents of ProductIterator.java:

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.UnsupportedEncodingException;
import java.util.Comparator;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import org.apache.lucene.search.suggest.InputIterator;
import org.apache.lucene.util.BytesRef;


class ProductIterator implements InputIterator
{
    private Iterator<Product> productIterator;
    private Product currentProduct;

    ProductIterator(Iterator<Product> productIterator) {
        this.productIterator = productIterator;
    }

    public boolean hasContexts() {
        return true;
    }

    public boolean hasPayloads() {
        return true;
    }

    public Comparator<BytesRef> getComparator() {
        return null;
    }

    // This method needs to return the key for the record; this is the
    // text we'll be autocompleting against.
    public BytesRef next() {
        if (productIterator.hasNext()) {
            currentProduct = productIterator.next();
            try {
                return new BytesRef(currentProduct.name.getBytes("UTF8"));
            } catch (UnsupportedEncodingException e) {
                throw new Error("Couldn't convert to UTF-8");
            }
        } else {
            return null;
        }
    }

    // This method returns the payload for the record, which is
    // additional data that can be associated with a record and
    // returned when we do suggestion lookups.  In this example the
    // payload is a serialized Java object representing our product.
    public BytesRef payload() {
        try {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutputStream out = new ObjectOutputStream(bos);
            out.writeObject(currentProduct);
            out.close();
            return new BytesRef(bos.toByteArray());
        } catch (IOException e) {
            throw new Error("Well that's unfortunate.");
        }
    }

    // This method returns the contexts for the record, which we can
    // use to restrict suggestions.  In this example we use the
    // regions in which a product is sold.
    public Set<BytesRef> contexts() {
        try {
            Set<BytesRef> regions = new HashSet();
            for (String region : currentProduct.regions) {
                regions.add(new BytesRef(region.getBytes("UTF8")));
            }
            return regions;
        } catch (UnsupportedEncodingException e) {
            throw new Error("Couldn't convert to UTF-8");
        }
    }

    // This method helps us order our suggestions.  In this example we
    // use the number of products of this type that we've sold.
    public long weight() {
        return currentProduct.numberSold;
    }
}

在我们的驱动程序中,我们将做以下事情:

In our driver program, we will do the following things:

  1. 在 RAM 中创建索引目录.
  2. 创建一个StandardTokenizer.
  3. 使用 RAM 目录和标记器创建一个 AnalyzingInfixSuggester.
  4. 使用 ProductIterator 为大量产品建立索引.
  5. 打印一些示例查找的结果.
  1. Create an index directory in RAM.
  2. Create a StandardTokenizer.
  3. Create an AnalyzingInfixSuggester using the RAM directory and tokenizer.
  4. Index a number of products using ProductIterator.
  5. Print the results of some sample lookups.

这是驱动程序,SuggestProducts.java:

Here's the driver program, SuggestProducts.java:

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester;
import org.apache.lucene.search.suggest.Lookup;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.Version;

public class SuggestProducts
{
    // Get suggestions given a prefix and a region.
    private static void lookup(AnalyzingInfixSuggester suggester, String name,
                               String region) {
        try {
            List<Lookup.LookupResult> results;
            HashSet<BytesRef> contexts = new HashSet<BytesRef>();
            contexts.add(new BytesRef(region.getBytes("UTF8")));
            // Do the actual lookup.  We ask for the top 2 results.
            results = suggester.lookup(name, contexts, 2, true, false);
            System.out.println("-- "" + name + "" (" + region + "):");
            for (Lookup.LookupResult result : results) {
                System.out.println(result.key);
                Product p = getProduct(result);
                if (p != null) {
                    System.out.println("  image: " + p.image);
                    System.out.println("  # sold: " + p.numberSold);
                }
            }
        } catch (IOException e) {
            System.err.println("Error");
        }
    }

    // Deserialize a Product from a LookupResult payload.
    private static Product getProduct(Lookup.LookupResult result)
    {
        try {
            BytesRef payload = result.payload;
            if (payload != null) {
                ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes);
                ObjectInputStream in = new ObjectInputStream(bis);
                Product p = (Product) in.readObject();
                return p;
            } else {
                return null;
            }
        } catch (IOException|ClassNotFoundException e) {
            throw new Error("Could not decode payload :(");
        }
    }

    public static void main(String[] args) {
        try {
            RAMDirectory index_dir = new RAMDirectory();
            StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48);
            AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(
                Version.LUCENE_48, index_dir, analyzer);

            // Create our list of products.
            ArrayList<Product> products = new ArrayList<Product>();
            products.add(
                new Product(
                    "Electric Guitar",
                    "http://images.example/electric-guitar.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Electric Train",
                    "http://images.example/train.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Acoustic Guitar",
                    "http://images.example/acoustic-guitar.jpg",
                    new String[]{"US", "ZA"},
                    80));
            products.add(
                new Product(
                    "Guarana Soda",
                    "http://images.example/soda.jpg",
                    new String[]{"ZA", "IE"},
                    130));

            // Index the products with the suggester.
            suggester.build(new ProductIterator(products.iterator()));

            // Do some example lookups.
            lookup(suggester, "Gu", "US");
            lookup(suggester, "Gu", "ZA");
            lookup(suggester, "Gui", "CA");
            lookup(suggester, "Electric guit", "US");
        } catch (IOException e) {
            System.err.println("Error!");
        }
    }
}

这是驱动程序的输出:

-- "Gu" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gu" (ZA):
Guarana Soda
  image: http://images.example/soda.jpg
  # sold: 130
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gui" (CA):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
-- "Electric guit" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100

附录

有一种方法可以避免编写完整的 InputIterator,您可能会发现它更容易.您可以编写一个存根 InputIterator 从它的 nextpayloadcontexts 返回 null> 方法.将它的一个实例传递给 AnalyzingInfixSuggesterbuild 方法:

Appendix

There's a way to avoid writing a full InputIterator that you might find easier. You can write a stub InputIterator that returns null from its next, payload and contexts methods. Pass an instance of it to AnalyzingInfixSuggester's build method:

suggester.build(new ProductIterator(new ArrayList<Product>().iterator()));

然后对于要索引的每个项目,调用 AnalyzingInfixSuggester add方法:

Then for each item you want to index, call the AnalyzingInfixSuggester add method:

suggester.add(text, contexts, weight, payload)

索引所有内容后,调用refresh:

After you've indexed everything, call refresh:

suggester.refresh();

如果您要索引大量数据,则可以通过多线程使用此方法显着加快索引速度:调用 build,然后使用多个线程来add 项,然后最后调用refresh.

If you're indexing large amounts of data, it's possible to significantly speedup indexing using this method with multiple threads: Call build, then use multiple threads to add items, then finally call refresh.

这篇关于如何使用 Lucene 的新 AnalyzingInfixSuggester API 实现自动建议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆