ElasticSearch C#Nest使用5.1获取顶级单词 [英] ElasticSearch C# Nest Getting top words with 5.1

查看:1885
本文介绍了ElasticSearch C#Nest使用5.1获取顶级单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个ElasticSearch对象,其中包含以下字段:

I have an ElasticSearch object with these fields:

[Keyword]
public List<string> Tags { get; set; }
[Text]
public string Title { get; set; }

而且,在我以前获得顶级标签之前,在所有的文档中,使用这段代码:

And, before I used to get the top Tags, in all the documents, using this code:

var Match = Driver.Search<Metadata>(_ => _
                  .Query(Q => Q
                  .Term(P => P.Category, (int)Category)
                     && Q.Term(P => P.Type, (int)Type))
                  .FielddataFields(F => F.Fields(F1 => F1.Tags, F2 => F2.Title))
                  .Aggregations(A => A.Terms("Tags", T => T.Field(F => F.Tags)
                  .Size(Limit))));

但是使用弹性5.1,我得到一个错误400与此提示:

But with Elastic 5.1, I get an error 400 with this hint:


默认情况下,字段数据在文本字段上禁用。在
[标签]中设置fielddata = true,以便通过反转
反向索引来加载内存中的fielddata。

Fielddata is disabled on text fields by default. Set fielddata=true on [Tags] in order to load fielddata in memory by uninverting the inverted index.

然后有关参数映射的ES文档告诉您通常没有意义,并且具有全文搜索的文本字段和启用了doc_values聚合的未分析的关键字字段。

Then the ES documentation about parameter mapping tells you "It usually doesn’t make sense to do so" and to "have a text field for full text searches, and an unanalyzed keyword field with doc_values enabled for aggregations".

但是唯一的文档是5.0,而5.1的相同页面似乎不存在。

But the only doc with this is for 5.0, and the same page for 5.1 seem to not exist.

现在,5.1有一个关于术语汇总,似乎涵盖了什么我需要,但是在C#/ Nest中绝对没有什么可以使用的。

Now, 5.1 has a page about Term Aggregation that seems to cover what I need, but there is absolutely nothing to be found in C# / Nest that I can use.

所以,我想知道如何才能得到顶尖的话,跨所有文件,从标签(每个标签是其自己的单词;例如纽约不是约克)和标题(每个单词是自己的东西) C#。

So, I'm trying to figure out how I can just get the top words, across all documents, from the Tags (where each tag is its own word; for example "New York" is not "New" and "York") and the title (where each word is its own thing) in C#.

我需要编辑这篇文章,因为似乎有一个更深层次的问题。我写了一些说明问题的测试代码:

I need to edit this post because there seems to be a deeper problem. I wrote some test code that illustrates the issue:

让我们创建一个简单的对象:

Let's create a simple object:

public class MyObject
{
    [Keyword]
    public string Id { get; set; }
    [Text]
    public string Category { get; set; }
    [Text(Fielddata = true)]
    public string Keywords { get; set; }
}

创建索引:

var Uri = new Uri(Constants.ELASTIC_CONNECTIONSTRING);
var Settings = new ConnectionSettings(Uri)
.DefaultIndex("test")
.DefaultFieldNameInferrer(_ => _)
.InferMappingFor<MyObject>(_ => _.IdProperty(P => P.Id));   
var D = new ElasticClient(Settings);

用随机填充索引:

for (var i = 0; i < 10; i++)
{
    var O = new MyObject
    {
        Id = i.ToString(),
        Category = (i % 2) == 0 ? "a" : "b",
        Keywords = (i % 3).ToString()
    };

    D.Index(O);
}

并执行查询:

var m = D.Search<MyObject>(s => s
    .Query(q => q.Term(P => P.Category, "a"))
    .Source(f => f.Includes(si => si.Fields(ff => ff.Keywords)))
    .Aggregations(a => a
        .Terms("Keywords", t => t
            .Field(f => f.Keywords)
            .Size(Limit)
        )
    )
);

它的失败方式与以前一样,具有400和:

It fails the same way as before, with a 400 and:


默认情况下,字段数据在文本字段上禁用。在
[关键字]中设置fielddata = true,以便通过反转
索引来反转内存中的fielddata。

Fielddata is disabled on text fields by default. Set fielddata=true on [Keywords] in order to load fielddata in memory by uninverting the inverted index.

但Fielddata在[关键字]设置为true,但它不断抱怨。

but Fielddata is set to true on [Keywords], yet it keeps complaining about it.

所以,让我们疯狂,并修改类:

so, let's get crazy and modify the class this way:

public class MyObject
{
    [Text(Fielddata = true)]
    public string Id { get; set; }
    [Text(Fielddata = true)]
    public string Category { get; set; }
    [Text(Fielddata = true)]
    public string Keywords { get; set; }
}

这样一切都是一个Text,一切都有Fielddata = true .. ,相同的结果。

that way everything is a Text and everything has Fielddata = true.. well, same result.

所以,我真的不明白一些简单的东西,或者破坏或没有记录:)

so, either I am really not understanding something simple, or it's broken or not documented :)

推荐答案

不太常见,你想要 Fielddata ;对于您在此处的特定搜索,您只需从搜索查询返回标签和标题字段,请查看使用 Source Filtering for this

It's less common that you want Fielddata; for your particular search here where you want to return just the tags and the title fields from the search query, take a look at using Source Filtering for this

var Match = client.Search<Metadata>(s => s
    .Query(q => q
        .Term(P => P.Category, (int)Category) && q
        .Term(P => P.Type, (int)Type)
    )
    .Source(f => f
        .Includes(si => si
            .Fields(
                ff => ff.Tags, 
                ff => ff.Title
            )
        )
    )
    .Aggregations(a => a
        .Terms("Tags", t => t
            .Field(f => f.Tags)
            .Size(Limit)
        )
    )
);

Fielddata需要将反向索引uninvert 转换为内存结构中的聚合和排序。虽然访问这些数据可能非常快,但它也可以为大数据集消耗大量内存。

Fielddata needs to uninvert the inverted index into an in memory structure for aggregations and sorting. Whilst accessing this data can be very fast, it can also consume a lot of memory for a large data set.

编辑:

在您的编辑中,我没有看到您创建索引的任何地方,并显式地映射您的 MyObject POCO;没有明确创建索引并映射POCO,Elasticsearch将根据收到的第一个json文档自动创建索引并推断 MyObject 的映射,意思是关键字将被映射为文本字段与关键字 multi_field 和Fielddata将不会在 text 字段映射。

Within your edit, I don't see anywhere where you create the index and explicitly map your MyObject POCO; without explicitly creating the index and mapping the POCO, Elasticsearch will automatically create the index and infer the mapping for MyObject based on the first json document that it receives, meaning Keywords will be mapped as a text field with a keyword multi_field and Fielddata will not be enabled on the text field mapping.

以下是一个示例来演示所有工作。

Here's an example to demonstrate it all working

void Main()
{
    var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
    var defaultIndex = "test";
    var connectionSettings = new ConnectionSettings(pool)
            .DefaultIndex(defaultIndex)
            .DefaultFieldNameInferrer(s => s)
            .InferMappingFor<MyObject>(m => m
                .IdProperty(p => p.Id)
            );

    var client = new ElasticClient(connectionSettings);

    if (client.IndexExists(defaultIndex).Exists)
        client.DeleteIndex(defaultIndex);

    client.CreateIndex(defaultIndex, c => c
        .Mappings(m => m
            .Map<MyObject>(mm => mm
                .AutoMap()
            )
        )
    );

    var objs = Enumerable.Range(0, 10).Select(i =>
        new MyObject
        {
            Id = i.ToString(),
            Category = (i % 2) == 0 ? "a" : "b",
            Keywords = (i % 3).ToString()
        });

    client.IndexMany(objs);

    client.Refresh(defaultIndex);

    var searchResponse = client.Search<MyObject>(s => s
        .Query(q => q.Term(P => P.Category, "a"))
        .Source(f => f.Includes(si => si.Fields(ff => ff.Keywords)))
        .Aggregations(a => a
            .Terms("Keywords", t => t
                .Field(f => f.Keywords)
                .Size(10)
            )
        )
    );

}

public class MyObject
{
    [Keyword]
    public string Id { get; set; }
    [Text]
    public string Category { get; set; }
    [Text(Fielddata = true)]
    public string Keywords { get; set; }
}

返回

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "myobject",
        "_id" : "8",
        "_score" : 0.9808292,
        "_source" : {
          "Keywords" : "2"
        }
      },
      {
        "_index" : "test",
        "_type" : "myobject",
        "_id" : "0",
        "_score" : 0.2876821,
        "_source" : {
          "Keywords" : "0"
        }
      },
      {
        "_index" : "test",
        "_type" : "myobject",
        "_id" : "2",
        "_score" : 0.13353139,
        "_source" : {
          "Keywords" : "2"
        }
      },
      {
        "_index" : "test",
        "_type" : "myobject",
        "_id" : "4",
        "_score" : 0.13353139,
        "_source" : {
          "Keywords" : "1"
        }
      },
      {
        "_index" : "test",
        "_type" : "myobject",
        "_id" : "6",
        "_score" : 0.13353139,
        "_source" : {
          "Keywords" : "0"
        }
      }
    ]
  },
  "aggregations" : {
    "Keywords" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "0",
          "doc_count" : 2
        },
        {
          "key" : "2",
          "doc_count" : 2
        },
        {
          "key" : "1",
          "doc_count" : 1
        }
      ]
    }
  }
}

哟您也可以考虑将关键字映射为文本字段与关键字 multi_field,使用非结构化搜索的文本字段和用于排序,聚合和结构化搜索的关键字。这样,你可以获得两个世界的最佳效果,不需要启用Fielddata

You might also consider mapping Keywords as a text field with a keyword multi_field, using the text field for unstructured search and the keyword for sorting, aggregations and structured search. This way, you get the best of both worlds and don't need to enable Fielddata

client.CreateIndex(defaultIndex, c => c
    .Mappings(m => m
        .Map<MyObject>(mm => mm
            .AutoMap()
            .Properties(p => p
                .Text(t => t
                    .Name(n => n.Keywords)
                    .Fields(f => f
                        .Keyword(k => k
                            .Name("keyword")
                        )
                    )
                )
            )
        )
    )
);

然后在搜索中使用

var searchResponse = client.Search<MyObject>(s => s
    .Query(q => q.Term(P => P.Category, "a"))
    .Source(f => f.Includes(si => si.Fields(ff => ff.Keywords)))
    .Aggregations(a => a
        .Terms("Keywords", t => t
            .Field(f => f.Keywords.Suffix("keyword"))
            .Size(10)
        )
    )
);

这篇关于ElasticSearch C#Nest使用5.1获取顶级单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆