Elasticsearch中的输出字段有什么好的选择5.1完成建议? [英] What's a good alternative for the output field in Elasticsearch 5.1 Completion Suggestions?

查看:179
本文介绍了Elasticsearch中的输出字段有什么好的选择5.1完成建议?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



消息[MapperParsingException [解析失败]嵌套:IllegalArgumentException [未知字段名[输出],必须是[input,weight,contexts]之一];]



所以我删除它现在很多我的自动完成是不正确的,因为它返回匹配的输入,而不是单个输出字符串。



经过一些搜索,我发现来自ES的文章提到了以下内容:


由于建议是面向文档的,建议元数据(例如输出)现在应该被指定为文档中的一个字段。索引建议条目时指定输出的支持已被删除。现在,建议结果条目的文本始终是建议输入的未分析值(与在5.0之前的索引中建立索引时没有指定输出相同)。


我发现原始值是使用与建议一起返回的_source字段,但它并不是真正的解决方案,因为键和结构根据原始对象的来源而改变。



我可以在原始对象上添加一个额外的输出字段,但这不是我的解决方案,因为在某些情况下,我有一个这样的结构:

  {
id:c2358e0c-7399-4665-ac2c-0bdd44597ac0,
同义词:[所有可用颜色,颜色],
autoComplete:[{
input:[可用颜色全部,可用颜色全部,可用的所有颜色,所有可用颜色,所有颜色可用]
},{
input:[colors]
}]
ES $中的

结构如下:

  {
id:c2358e0c-7399-4665-ac2c-0bdd44597ac0,
同义词:[所有可用颜色,颜色],
SmartSynonym:[{
input:[可用颜色全部,可用颜色全部,可用所有颜色,全部可用颜色,全部可用颜色,所有可用颜色 ],
output:[所有可用颜色]
},{
input:[colors],
output:[Colors ]
}]
}

这不是任何问题,输出字段存在于每个自动填充对象中。



如何在ES 5.1中返回原始值(例如所有可用的颜色)当以一种简单的方式询问可用颜色时,无需进行大量手动查找。



其他用户的相关问题:自动填充建议中的输入字段

解决方案

经过一番研究,我最终创建了一个新的Elasticsearch 5.1.1插件



创建一个lucene过滤器



  import org.apache.lucene.analysis.TokenFilter; 
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute;

import java.io.IOException;
import java.util。*;

/ **
*由Glenn创建于13.01.17。
* /
public class PermutationTokenFilter扩展TokenFilter {
private final CharTermAttribute charTermAtt;
private final PositionIncrementAttribute posIncrAtt;
private final OffsetAttribute offsetAtt;
private Iterator< String>排列;
private int origOffset;

/ **
*构造过滤给定输入的令牌流。
*
* @param input
* /
protected PermutationTokenFilter(TokenStream input){
super(input);
this.charTermAtt = addAttribute(CharTermAttribute.class);
this.posIncrAtt = addAttribute(PositionIncrementAttribute.class);
this.offsetAtt = addAttribute(OffsetAttribute.class);
}

@Override
public final boolean incrementToken()throws IOException {
while(true){
//查看是否已经创建了排列
if(permutations == null){
//看是否有更多的令牌可用
if(!input.incrementToken()){
return false;
} else {
//获取值
String value = String.valueOf(charTermAtt);
//取消缓冲区值并创建迭代器
permutations =置换(value).iterator();
origOffset = posIncrAtt.getPositionIncrement();
}
}
//查看是否有剩余排列
if(permutations.hasNext()){
//将属性重置为起始点
clearAttributes();
//使用下一个排列
String permutation = permutations.next();
//添加转换属性并删除旧属性
charTermAtt.setEmpty()。append(permutation);
posIncrAtt.setPositionIncrement(origOffset);
offsetAtt.setOffset(0,permutation.length());
//从迭代器中删除置换
permutations.remove();
origOffset = 0;
返回true;
}
permutations = null;
}
}

/ **
*更改多值关键字的顺序,以便完成建议者仍然知道没有
*标记化的原始值如果用户以不同的顺序询问单词。
*
* @param value unpermute value ex:Yellow Crazy Banana
* @return许可值例如:
*黄色疯狂香蕉,
*黄色香蕉疯狂,
*疯狂黄色香蕉,
*疯狂香蕉黄色,
*香蕉疯狂黄色,
*香蕉黄色疯狂
* /
私人设置< String>置换(String value){
value = value.trim()。replaceAll(+,);
//使用集合来消除语义重复(即使在单个值中多次出现一个单词的情况下,您也可以切换两个)ab仍然是aab
//切换到HashSet以获得更好的性能
设置< String> set = new HashSet< String>();
String [] words = value.split();
//终止条件:对于1个字的数组,只有1个置换
if(words.length == 1){
set.add(value);
} else if(words.length< = 6){
//给每个单词有机会成为排列数组中的第一个
for(int i = 0; i< word.length; i ++){
//从数组中删除索引i上的单词
String pre =; (int j = 0; j< i; j ++)
{
pre + = words [j] +;
}

String post =; (int j = i + 1; j< words.length; j ++){
post + = words [j] +;

}
String remaining =(pre + post).trim();

//重新查找剩余单词的所有排列
for(String permutation:permutation(remaining)){
//将第一个单词与排列的排列剩余字
set.add(words [i] ++排列);
}
}
} else {
Collections.addAll(set,words);
set.add(value);
}
return set;
}
}

此过滤器将获取原始输入令牌所有可用颜色并将其置换成所有可能的组合(请参阅原始问题)



创建工厂



  import org.apache.lucene.analysis.TokenStream; 
import org.elasticsearch.index.analysis.AbstractTokenFilterFactory;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.IndexSettings;


/ **
*由Glenn创建于16.01.17。
* /
public class PermutationTokenFilterFactory extends AbstractTokenFilterFactory {

public PermutationTokenFilterFactory(IndexSettings indexSettings,Environment environment,String name,Settings settings){
super(indexSettings,name,设置);
}

public PermutationTokenFilter create(TokenStream input){
return new PermutationTokenFilter(input);
}
}

此类需要为弹性搜索提供过滤器



创建弹性搜索插件



按照本指南设置Elasticsearch插件所需的配置。

 <?xml version =1.0encoding =UTF-8?> 
< project xmlns =http://maven.apache.org/POM/4.0.0
xmlns:xsi =http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation =http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd\">
< modelVersion> 4.0.0< / modelVersion>

< groupId> be.smartspoken< / groupId>
< artifactId>置换插件< / artifactId>
< version> 5.1.1-SNAPSHOT< / version>
< packaging> jar< / packaging>
< name>插件:排列< / name>
< description>弹性搜索的排列插件< / description>
<属性>
< lucene.version> 6.3.0< /lucene.version>
< elasticsearch.version> 5.1.1< / alasticsearch.version>
< java.version> 1.8< / java.version>
< log4j2.version> 2.7< /log4j2.version>
< / properties>

<依赖关系>
<依赖关系>
< groupId> org.apache.logging.log4j< / groupId>
< artifactId> log4j-api< / artifactId>
< version> $ {log4j2.version}< / version>
< / dependency>
<依赖关系>
< groupId> org.apache.logging.log4j< / groupId>
< artifactId> log4j-core< / artifactId>
< version> $ {log4j2.version}< / version>
< / dependency>
<依赖关系>
< groupId> org.apache.lucene< / groupId>
< artifactId> lucene-test-framework< / artifactId>
< version> $ {lucene.version}< / version>
< scope> test< / scope>
< / dependency>
<依赖关系>
< groupId> org.apache.lucene< / groupId>
< artifactId> lucene-core< / artifactId>
< version> $ {lucene.version}< / version>
< scope>已提供< / scope>
< / dependency>
<依赖关系>
< groupId> org.apache.lucene< / groupId>
< artifactId> lucene-analyzerers-common< / artifactId>
< version> $ {lucene.version}< / version>
< scope>已提供< / scope>
< / dependency>
<依赖关系>
< groupId> org.elasticsearch< / groupId>
< artifactId> elasticsearch< / artifactId>
< version> $ {elasticsearch.version}< / version>
< scope>已提供< / scope>
< / dependency>
< / dependencies>

< build>
< resources>
< resource>
< directory> src / main / resources< / directory>
< filtering> false< / filtering>
< excludes>
< exclude> *。properties< / exclude>
< / excludes>
< / resource>
< / resources>
< plugins>
< plugin>
< groupId> org.apache.maven.plugins< / groupId>
< artifactId> maven-assembly-plugin< / artifactId>
< version> 2.6< / version>
< configuration>
< appendAssemblyId> false< / appendAssemblyId>
< outputDirectory> $ {project.build.directory} / releases /< / outputDirectory>
< descriptors>
< descriptor> $ {basedir} /src/main/assemblies/plugin.xml</descriptor>
< / descriptors>
< / configuration>
<执行>
< execution>
< phase> package< / phase>
< goals>
< goal> single< / goal>
< / goals>
< / execution>
< / executions>
< / plugin>
< plugin>
< groupId> org.apache.maven.plugins< / groupId>
< artifactId> maven-compiler-plugin< / artifactId>
< version> 3.3< / version>
< configuration>
< source> $ {java.version}< / source>
< target> $ {java.version}< / target>
< / configuration>
< / plugin>
< / plugins>
< / build>

< / project>

确保使用正确的Elasticsearch,Lucene和Log4J(2)version.in你的pom.xml文件并提供正确的配置文件

  import be.smartspoken.plugin.permutation.filter.PermutationTokenFilterFactory; 
import org.elasticsearch.index.analysis.TokenFilterFactory;
import org.elasticsearch.indices.analysis.AnalysisModule;
import org.elasticsearch.plugins.AnalysisPlugin;
import org.elasticsearch.plugins.Plugin;

import java.util.HashMap;
import java.util.Map;

/ **
*由Glenn创建于13.01.17。
* /
public class PermutationPlugin extends Plugin implements AnalysisPlugin {

@Override
public Map< String,AnalysisModule.AnalysisProvider&TokenFilterFactory>> getTokenFilters(){
Map< String,AnalysisModule.AnalysisProvider< TokenFilterFactory>> extra = new HashMap<>();
extra.put(permutation,PermutationTokenFilterFactory :: new);
返还额外费用
}
}

向工厂提供插件。



安装新插件后,需要重新启动弹性搜索。



使用插件



添加一个新的自定义分析器,嘲笑2.x的功能

  Settings.builder )
.put(number_of_shards,2)
.loadFromSource(jsonBuilder()
.startObject()
.startObject(analysis)
.startObject 分析器)
.startObject(permutation_analyzer)
.field(tokenizer,keyword)
.field(filter,new String [] {permutation smallcase})
.endObject()
.endObject()
.endObject()
.endObject()。string())
.loadFromSource(jsonBuilder()
.startObject()
.startObject(analysis)

.startObject(lowercase_keyword_analyzer)
.field(tokenizer,keyword)
.field(filter,new String [] {小写})
.endObject()
.endObject()
.endObject()
.endObject()。string())
.build();

现在只需要提供自定义分析器到您的对象映射

  {
my_object:{
dynamic_templates:[{
autocomplete:{
path_match:my.autocomplete.object.path,
match_mapping_type:*,
映射:{
type:completion,
analyzer:permutation_analyzer,/ *自定义分析器* /
search_analyzer:smallcase_keyword_analyzer/ *自定义分析器* /
}
}
}] ,
属性:{
/ *您的其他属性* /
}
}
}

这样也可以提高性能,因为您不必等待重新构建排列。


The first error I encountered when indexing my data in ES 5.1 was my Completion Suggestion mapping which contained an output field.

message [MapperParsingException[failed to parse]; nested: IllegalArgumentException[unknown field name [output], must be one of [input, weight, contexts]];]

So I removed it but now a lot of my Auto completions are incorrect because it returns the input it matched instead of the single output String.

After some googling I found this article from ES which mentioned the following:

As suggestions are document-oriented, suggestion metadata (e.g. output) should now be specified as a field in the document. The support for specifying output when indexing suggestion entries has been removed. Now suggestion result entry’s text is always the un-analyzed value of the suggestion’s input (same as not specifying output while indexing suggestions in pre-5.0 indices).

I've found that the original value is withing the _source field that is returned with the suggestion, but it's not really a solution for me because the key and structure changes based on the original object it comes from.

I can add an extra 'output' field on the original object to but this isn't a solution for me either because in some cases I have a structure like this:

{
    "id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0",
    "synonyms": ["All available colours", "Colors"],
    "autoComplete": [{
        "input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"]
    }, {
        "input": ["colors"]
    }]
}

in ES 2.4 the structure was like this:

{
    "id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0",
    "synonyms": ["All available colours", "Colors"],
    "SmartSynonym": [{
        "input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"],
        "output": ["All available colours"]
    }, {
        "input": ["colors"],
        "output": ["Colors"]
    }]
    }

This wasn't any problem when the 'output' field was present in every Autocomplete object.

How can I return the original value in ES 5.1 (ex. All available colours) when asking "colours available all" in an easy way without doing to much manual lookups.

Related Question from other user: Output field in autocomplete suggestion

解决方案

After some research I ended up creating a new Elasticsearch 5.1.1 plugin

Create a lucene filter

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute;

import java.io.IOException;
import java.util.*;

/**
 * Created by glenn on 13.01.17.
 */
public class PermutationTokenFilter extends TokenFilter {
    private final CharTermAttribute charTermAtt;
    private final PositionIncrementAttribute posIncrAtt;
    private final OffsetAttribute offsetAtt;
    private Iterator<String> permutations;
    private int origOffset;

    /**
     * Construct a token stream filtering the given input.
     *
     * @param input
     */
    protected PermutationTokenFilter(TokenStream input) {
        super(input);
        this.charTermAtt = addAttribute(CharTermAttribute.class);
        this.posIncrAtt = addAttribute(PositionIncrementAttribute.class);
        this.offsetAtt = addAttribute(OffsetAttribute.class);
    }

    @Override
    public final boolean incrementToken() throws IOException {
        while (true) {
            //see if permutations have been created already
            if (permutations == null) {
                //see if more tokens are available
                if (!input.incrementToken()) {
                    return false;
                } else {
                    //Get value
                    String value = String.valueOf(charTermAtt);
                    //permute over buffer value and create iterator
                    permutations = permutation(value).iterator();
                    origOffset = posIncrAtt.getPositionIncrement();
                }
            }
            //see if there are remaining permutations
            if (permutations.hasNext()) {
                //Reset the attribute to starting point
                clearAttributes();
                //use the next permutation
                String permutation = permutations.next();
                //add te permutation to the attributes and remove old attributes
                charTermAtt.setEmpty().append(permutation);
                posIncrAtt.setPositionIncrement(origOffset);
                offsetAtt.setOffset(0,permutation.length());
                //remove permutation from iterator
                permutations.remove();
                origOffset = 0;
                return true;
            }
            permutations = null;
        }
    }

    /**
     * Changes the order of a multi value keyword so the completion suggester still knows the original value without
     * tokenizing it if the users asks the words in a different order.
     *
     * @param value unpermuted value ex: Yellow Crazy Banana
     * @return Permuted values ex:
     * Yellow Crazy Banana,
     * Yellow Banana Crazy,
     * Crazy Yellow Banana,
     * Crazy Banana Yellow,
     * Banana Crazy Yellow,
     * Banana Yellow Crazy
     */
    private Set<String> permutation(String value) {
        value = value.trim().replaceAll(" +", " ");
        // Use sets to eliminate semantic duplicates (a a b is still a a b even if you switch the two 'a's in case one word occurs multiple times in a single value)
        // Switch to HashSet for better performance
        Set<String> set = new HashSet<String>();
        String[] words = value.split(" ");
        // Termination condition: only 1 permutation for a array of 1 word
        if (words.length == 1) {
            set.add(value);
        } else if (words.length <= 6) {
            // Give each word a chance to be the first in the permuted array
            for (int i = 0; i < words.length; i++) {
                // Remove the word at index i from the array
                String pre = "";
                for (int j = 0; j < i; j++) {
                    pre += words[j] + " ";
                }

                String post = " ";
                for (int j = i + 1; j < words.length; j++) {
                    post += words[j] + " ";
                }
                String remaining = (pre + post).trim();

                // Recurse to find all the permutations of the remaining words
                for (String permutation : permutation(remaining)) {
                    // Concatenate the first word with the permutations of the remaining words
                    set.add(words[i] + " " + permutation);
                }
            }
        } else {
            Collections.addAll(set, words);
            set.add(value);
        }
        return set;
    }
}

This filter will take the original input token "All available colours" and permute it into all the possible combinations (see original question)

Create the factory

import org.apache.lucene.analysis.TokenStream;
import org.elasticsearch.index.analysis.AbstractTokenFilterFactory;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.IndexSettings;


/**
 * Created by glenn on 16.01.17.
 */
public class PermutationTokenFilterFactory extends AbstractTokenFilterFactory {

    public PermutationTokenFilterFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
        super(indexSettings, name, settings);
    }

    public PermutationTokenFilter create(TokenStream input) {
        return new PermutationTokenFilter(input);
    }
}

This class is needed to provide the filter to the Elasticsearch plugin.

Create the Elasticsearch plugin

Follow this guide to setup the needed configuration for the Elasticsearch plugin.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>be.smartspoken</groupId>
    <artifactId>permutation-plugin</artifactId>
    <version>5.1.1-SNAPSHOT</version>
    <packaging>jar</packaging>
    <name>Plugin: Permutation</name>
    <description>Permutation plugin for elasticsearch</description>
    <properties>
        <lucene.version>6.3.0</lucene.version>
        <elasticsearch.version>5.1.1</elasticsearch.version>
        <java.version>1.8</java.version>
        <log4j2.version>2.7</log4j2.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>${log4j2.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>${log4j2.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-test-framework</artifactId>
            <version>${lucene.version}</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>${lucene.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
            <version>${lucene.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>${elasticsearch.version}</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

    <build>
        <resources>
            <resource>
                <directory>src/main/resources</directory>
                <filtering>false</filtering>
                <excludes>
                    <exclude>*.properties</exclude>
                </excludes>
            </resource>
        </resources>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.6</version>
                <configuration>
                    <appendAssemblyId>false</appendAssemblyId>
                    <outputDirectory>${project.build.directory}/releases/</outputDirectory>
                    <descriptors>
                        <descriptor>${basedir}/src/main/assemblies/plugin.xml</descriptor>
                    </descriptors>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.3</version>
                <configuration>
                    <source>${java.version}</source>
                    <target>${java.version}</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

Make sure you use the correct Elasticsearch, Lucene and Log4J(2) version.in you pom.xml file and provide the correct configuration files

import be.smartspoken.plugin.permutation.filter.PermutationTokenFilterFactory;
import org.elasticsearch.index.analysis.TokenFilterFactory;
import org.elasticsearch.indices.analysis.AnalysisModule;
import org.elasticsearch.plugins.AnalysisPlugin;
import org.elasticsearch.plugins.Plugin;

import java.util.HashMap;
import java.util.Map;

/**
 * Created by glenn on 13.01.17.
 */
public class PermutationPlugin extends Plugin implements AnalysisPlugin{

    @Override
    public Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
        Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> extra = new HashMap<>();
        extra.put("permutation", PermutationTokenFilterFactory::new);
        return extra;
    }
}

provide the factory to the plugin.

After you installed your new plugin you need to restart your Elasticsearch.

Use the plugin

Add a new custom analyzer that "mocks" the functionality of 2.x

            Settings.builder()
                .put("number_of_shards", 2)
                .loadFromSource(jsonBuilder()
                        .startObject()
                            .startObject("analysis")
                                .startObject("analyzer")
                                    .startObject("permutation_analyzer")
                                        .field("tokenizer", "keyword")
                                        .field("filter", new String[]{"permutation","lowercase"})
                                    .endObject()
                                .endObject()
                            .endObject()
                        .endObject().string())
                .loadFromSource(jsonBuilder()
                        .startObject()
                            .startObject("analysis")
                                .startObject("analyzer")
                                    .startObject("lowercase_keyword_analyzer")
                                        .field("tokenizer", "keyword")
                                        .field("filter", new String[]{"lowercase"})
                                    .endObject()
                                .endObject()
                            .endObject()
                        .endObject().string())
                .build();

Now the only you have to do is provide the custom analyzers to your object mapping

{
    "my_object": {
        "dynamic_templates": [{
            "autocomplete": {
                "path_match": "my.autocomplete.object.path",
                "match_mapping_type": "*",
                "mapping": {
                    "type": "completion",
                    "analyzer": "permutation_analyzer", /* custom analyzer */
                    "search_analyzer": "lowercase_keyword_analyzer" /* custom analyzer */
                }
            }
        }],
        "properties": {
            /*your other properties*/
        }
    }
}

This will also improve performace because you don't have to wait for building the permutations anymore.

这篇关于Elasticsearch中的输出字段有什么好的选择5.1完成建议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆