ElasticSearch Nest插入/更新 [英] ElasticSearch Nest Insert/Update

查看:1299
本文介绍了ElasticSearch Nest插入/更新的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下查询创建了一个弹性索引:

  PUT public_site 
{
mappings:{
page:{
properties:{
url:{
type:string
},
title:{
type:string
},
body:{
type:string
} ,
meta_description:{
type:string
},
keywords:{
type:string

category:{
type:string
},
last_updated_date:{
type:date
},
source_id:{
type:string
}
}
}
}
}

我想使用.net NEST库将文档插入此索引。我的问题是.net更新方法的签名对我没有任何意义。

  client.Update< TDocument> IUPDateRequest< TDocument,TPartialDocument>)

Java库对我来说更有意义:

  UpdateRequest updateRequest = new UpdateRequest(); 
updateRequest.index(index);
updateRequest.type(type);
updateRequest.id(1);
updateRequest.doc(jsonBuilder()
.startObject()
.field(gender,male)
.endObject());
client.update(updateRequest).get();

在NEST中,TDocument和TPartialDocument类来自哪里?
这些C#类代表我的索引?

解决方案

TDocument TPartialDocument 是POCO类型的通用类型参数,




  • 表示Elasticsearch中的文档( TDocument )和

  • 在执行部分更新时在Elasticsearch( TPartialDocument )中的文档的一部分的表示



在完整更新的情况下, TDocument TPartialDocument 可以指具体的POCO类型。让我们来看看一些例子来演示。



让我们创建一个索引,并使用上面定义的映射。首先,我们可以使用POCO类型代表一个文档

  public class Page 
{
public string Url {get;组; }

public string标题{get;组; }

public string Body {get;组; }

[String(Name =meta_description)]
public string MetaDescription {get;组; }

public IList< string>关键词{get;组; }

public string Category {get;组; }

[Date(Name =last_updated_date)]
public DateTimeOffset LastUpdatedDate {get;组; }

[String(Name =source_id)]
public string SourceId {get;组; }
}

默认情况下,当NEST序列化POCO属性时,它使用骆驼套装命名约定。因为你的索引有一些属性的蛇套,例如last_updated_date,我们可以覆盖NEST将其序列化为使用属性的名称。



接下来,让我们创建客户端使用

  var pool = new SingleNodeConnectionPool(new Uri(http:// localhost:9200)); 
var pagesIndex =pages;
var connectionSettings = new ConnectionSettings(pool)
.DefaultIndex(pagesIndex)
.PrettyJson()
.DisableDirectStreaming()
.OnRequestCompleted(response =>
{
//注销请求
if(response.RequestBodyInBytes!= null)
{
Console.WriteLine(
${response.HttpMethod} {response.Uri} \\\
+
${Encoding.UTF8.GetString(response.RequestBodyInBytes)});
}
else
{
Console.WriteLine(${response.HttpMethod} {response.Uri});
}

Console.WriteLine();

//注销响应
if(response.ResponseBodyInBytes!= null)
{
Console.WriteLine($Status:{response。 HttpStatusCode} \\\
+
${Encoding.UTF8.GetString(response.ResponseBodyInBytes)} \\\
+
${new string(' - ',30)} \\\
);
}

{
Console.WriteLine($Status:{response.HttpStatusCode} \\\
+
${new string(' - ' ,30)} \\\
);
}
});

var client = new ElasticClient(connectionSettings);

连接设置的配置方式在开发时有帮助;


  1. DefaultIndex() - 默认索引已配置为pages。如果没有明确的索引名称在请求上传递,并且不能为POCO推断索引名称,那么将使用默认索引。

  2. PrettyJson() - Prettify(即缩进)json请求和响应。这将有助于查看发送到Elasticsearch的内容。

  3. DisableDirectStreaming() - NEST默认将POCO序列化为请求流,并从响应流反序列化响应类型。禁用此直接流式传输将缓冲内存流中的请求和响应字节,从而允许我们以 OnRequestCompleted()

  4. OnRequestCompleted() - 收到响应后调用。这可以让我们在开发过程中注销请求和响应。

2,3和4在开发过程中很有用,但会来有一些性能开销,所以你可能决定不在生产中使用它们。



现在,让我们使用页面映射创建索引

  //删除索引(如果存在)。有用的演示目的,以便
//我们可以重新运行这个例子。
if(client.IndexExists(pagesIndex).Exists)
client.DeleteIndex(pagesIndex);

//创建索引,同时将页面类型的映射添加到索引
//。 Automap()将从POCO
var createIndexResponse = client.CreateIndex(pagesIndex,c => c
.Mappings(m => m
.Map< Page>( p => p
.AutoMap()


);

查看有关如何控制POCO类型映射的更多详细信息的自动化文档



索引新的页面类型就像

  //创建一个示例页面
var page = new Page
{
Title =Sample Page,
Body =Sample Body,
Category =sample,
关键字= new List< string>
{
sample,
example,
demo
},
LastUpdatedDate = DateTime.UtcNow,
MetaDescription = sample meta description,
SourceId =1,
Url =/ pages / sample-page
};

//将样本页索引到Elasticsearch。
// NEST将从POCO类型中推断文档类型(_type),
//默认情况下,将使用POCO类型名称
var indexResponse = client.Index(page);

索引文档将创建文档(如果不存在),或覆盖现有文档存在。 Elasticsearch具有乐观的并发控制功能,可用于控制我们可以使用更新方法更新文档,但首先有一点背景。



我们可以通过指定索引,类型和ID来从Elasticsearch获取文档。 NEST使这一点更容易,因为我们可以从POCO中推断出所有这些。当我们创建我们的映射时,我们没有在POCO上指定一个 Id 属性;如果 NEST 看到一个名为 Id 的属性,则将其用作文档的ID,但因为我们没有一个,这不是一个问题,因为Elasticsearch将生成文档的id并将其放在文档元数据中。因为文档元数据与源文档是分离的,所以这可以使建模文档成为POCO类型有点棘手(但不是不可能);对于给定的响应,我们将通过元数据访问文档的id,并通过 _source 字段访问源。我们可以在应用程序中将id与我们的源组合。



更简单的解决方法是在POCO上添加一个id。我们可以在POCO上指定一个 Id 属性,这将被用作文档的id,但是我们不必调用属性 Id 如果我们不想,如果我们不要,我们需要告诉NEST哪个属性表示id。这可以用一个属性来完成。假设 SourceId 页面实例的唯一ID,请使用 ElasticsearchTypeAttribute IdProperty 属性来指定。也许我们不应该分析这个字符串,而是逐字地进行索引,也可以通过属性


上的索引属性来控制

  [ElasticsearchType(IdProperty = nameof(SourceId))] 
public class Page
{
public string Url {get;组; }

public string标题{get;组; }

public string Body {get;组; }

[String(Name =meta_description)]
public string MetaDescription {get;组; }

public IList< string>关键词{get;组; }

public string Category {get;组; }

[Date(Name =last_updated_date)]
public DateTimeOffset LastUpdatedDate {get;组; }

[String(Name =source_id,Index = FieldIndexOption.NotAnalyzed)]
public string SourceId {get;组; }
}

有了这些,我们需要像以前那样重新创建索引这些更改反映在映射中,NEST可以在索引 Page 实例时使用此配置。



现在,返回更新:)我们可以从Elasticsearch获取文档,在应用程序中更新它,然后重新索引。

  var getResponse = client.Get< Page>(1); 

var page = getResponse.Source;

//更新最后更新日期
page.LastUpdatedDate = DateTime.UtcNow;

var updateResponse = client.Update< Page>(page,u => u.Doc(page));

第一个参数是我们要获取的文档的ID,可以由NEST从页面实例。由于我们将整个文档传回这里,所以我们可以使用 .Index()而不是 Update() ,因为我们正在更新所有字段

  var indexResponse = client.Index(page); 

但是,由于我们只想更新 LastUpdatedDate ,必须从Elasticsearch获取文档,在应用程序中更新它,然后将文档发送回Elasticsearch是一项很多工作。我们只能将更新的 LastUpdatedDate 发送到Elasticsearch,而不是使用部分文档。 C#匿名类型在这里非常有用

  //使用匿名类型建立我们的部分文档。 
//请注意,我们需要使用蛇壳名称
//(NEST仍然会使用属性名称,但是这个
//并不帮助我们)
var lastUpdatedDate = new
{
last_updated_date = DateTime.UtcNow
};

//进行部分更新。
//页面是TDocument,对象是TPartialDocument
var partialUpdateResponse = client.Update< Page,object>(1,u => u
.Doc(lastUpdatedDate)
);

我们可以在这里使用乐观并发控制,如果我们需要使用 RetryOnConflict(int )

  var partialUpdateResponse = client.Update< Page,object>(1,u = > u 
.Doc(lastUpdatedDate)
.RetryOnConflict(1)
);

使用部分更新,Elasticsearch将获取文档,应用部分更新,然后索引更新的文档;如果文档在获取和更新之间发生变化,Elasticsearch将再次尝试基于 RetryOnConflict(1)



希望有助于:)


I have created an index in elastic using the following query:

PUT public_site
{
  "mappings": {
    "page": {
      "properties": {
        "url": {
          "type": "string"
        },
        "title":{
          "type": "string"
        },
        "body":{
          "type": "string"
        },
        "meta_description":{
          "type": "string"
        },
        "keywords":{
          "type": "string"
        },
        "category":{
          "type": "string"
        },
        "last_updated_date":{
          "type": "date"
        },
        "source_id":{
        "type":"string"
        }
      }
    }
  }
}

I would like to insert a document into this index using the .net NEST library. My issue is that the .net update method's signature doesn't make any sense to me.

client.Update<TDocument>(IUpdateRequest<TDocument,TPartialDocument>)

The Java library makes so much more sense to me:

UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("type");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
        .startObject()
            .field("gender", "male")
        .endObject());
client.update(updateRequest).get();

In NEST where do the TDocument and TPartialDocument classes come from? Are these C# classes that I make representing my index?

解决方案

TDocument and TPartialDocument are generic type parameters for the POCO type that

  • represent a document in Elasticsearch (TDocument) and
  • a representation of part of the the document in Elasticsearch (TPartialDocument), when performing a partial update.

In the case of a full update, TDocument and TPartialDocument may refer to the same concrete POCO type. Let's have a look at some examples to demonstrate.

Let's create an index with the mapping that you have defined above. Firstly, we can represent a document using a POCO type

public class Page
{
    public string Url { get; set; }

    public string Title { get; set; }

    public string Body { get; set; }

    [String(Name="meta_description")]
    public string MetaDescription { get; set; }

    public IList<string> Keywords { get; set; }

    public string Category { get; set; }

    [Date(Name="last_updated_date")]
    public DateTimeOffset LastUpdatedDate { get; set; }

    [String(Name="source_id")]
    public string SourceId { get; set; }
}

By default, when NEST serializes POCO properties it uses camel casing naming convention. Because your index has snake casing for some properties e.g. "last_updated_date", we can override the name that NEST serializes these to using attributes.

Next, let's create the client to work with

var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var pagesIndex = "pages";
var connectionSettings = new ConnectionSettings(pool)
        .DefaultIndex(pagesIndex)
        .PrettyJson()
        .DisableDirectStreaming()
        .OnRequestCompleted(response =>
            {
                // log out the request
                if (response.RequestBodyInBytes != null)
                {
                    Console.WriteLine(
                        $"{response.HttpMethod} {response.Uri} \n" +
                        $"{Encoding.UTF8.GetString(response.RequestBodyInBytes)}");
                }
                else
                {
                    Console.WriteLine($"{response.HttpMethod} {response.Uri}");
                }

                Console.WriteLine();

                // log out the response
                if (response.ResponseBodyInBytes != null)
                {
                    Console.WriteLine($"Status: {response.HttpStatusCode}\n" +
                             $"{Encoding.UTF8.GetString(response.ResponseBodyInBytes)}\n" +
                             $"{new string('-', 30)}\n");
                }
                else
                {
                    Console.WriteLine($"Status: {response.HttpStatusCode}\n" +
                             $"{new string('-', 30)}\n");
                }
            });

var client = new ElasticClient(connectionSettings);

Connection settings has been configured in a way that is helpful whilst developing;

  1. DefaultIndex() - The default index has been configured to be "pages". If no explicit index name is passed on a request and no index name can be inferred for a POCO, then the default index will be used.
  2. PrettyJson() - Prettify (i.e. indent) json requests and responses. This will be useful to see what is being sent to and received from Elasticsearch.
  3. DisableDirectStreaming() - NEST by default serializes POCOs to the request stream and deserializes response types from the response stream. Disabling this direct streaming will buffer the request and response bytes in memory streams, allowing us to log them out in OnRequestCompleted()
  4. OnRequestCompleted() - Called after a response is received. This allows us to log out requests and responses whilst we're developing.

2, 3 and 4 are useful during development but will come with some performance overhead so you may decide not to use them in production.

Now, let's create the index with the Page mapping

// delete the index if it exists. Useful for demo purposes so that
// we can re-run this example.
if (client.IndexExists(pagesIndex).Exists)
    client.DeleteIndex(pagesIndex);

// create the index, adding the mapping for the Page type to the index
// at the same time. Automap() will infer the mapping from the POCO
var createIndexResponse = client.CreateIndex(pagesIndex, c => c
    .Mappings(m => m
        .Map<Page>(p => p
            .AutoMap()
        )
    )
);

Take a look at the automapping documentation for more details around how you can control mapping for POCO types

Indexing a new Page type is as simple as

// create a sample Page
var page = new Page
{
    Title = "Sample Page",
    Body = "Sample Body",
    Category = "sample",
    Keywords = new List<string>
    {
        "sample",
        "example", 
        "demo"
    },
    LastUpdatedDate = DateTime.UtcNow,
    MetaDescription = "Sample meta description",
    SourceId = "1",
    Url = "/pages/sample-page"
};

// index the sample Page into Elasticsearch.
// NEST will infer the document type (_type) from the POCO type,
// by default it will camel case the POCO type name
var indexResponse = client.Index(page);

Indexing a document will create the document if it does not exist, or overwrite an existing document if it does exist. Elasticsearch has optimistic concurrency control that can be used to control how this behaves under different conditions.

We can update a document using the Update methods, but first a little background.

We can get a document from Elasticsearch by specifying the index, type and id. NEST makes this slightly easier because we can infer all of these from the POCO. When we created our mapping, we didn't specify an Id property on the POCO; if NEST sees a property called Id, it uses this as the id for the document but because we don't have one, that's not a problem as Elasticsearch will generate an id for the document and put this in the document metadata. Because the document metadata is separate from the source document however, this can make modelling documents as POCO types a little trickier (but not impossible); for a given response, we will have access to the id of the document through the metadata and access to the source through the _source field. We can combine the id with our source in the application.

An easier way to address this though is to have an id on the POCO. We can specify an Id property on the POCO and this will be used as the id of the document, but we don't have to call the property Id if we don't want to and if we don't, we need to tell NEST which property represents the id. This can be done with an attribute. Assuming that SourceId is a unique id for a Page instance, use the ElasticsearchTypeAttribute IdProperty property to specify this. Maybe we shouldn't also analyze this string but index it verbatim, we can also control this through the Index property of the attribute on the property

[ElasticsearchType(IdProperty = nameof(SourceId))]
public class Page
{
    public string Url { get; set; }

    public string Title { get; set; }

    public string Body { get; set; }

    [String(Name="meta_description")]
    public string MetaDescription { get; set; }

    public IList<string> Keywords { get; set; }

    public string Category { get; set; }

    [Date(Name="last_updated_date")]
    public DateTimeOffset LastUpdatedDate { get; set; }

    [String(Name="source_id", Index=FieldIndexOption.NotAnalyzed)]
    public string SourceId { get; set; }
}

With these in place, we would need to recreate the index as before so that these changes are reflected in the mapping and NEST can use this configuration when indexing a Page instance.

Now, back to updates :) We can get a document from Elasticsearch, update it in the application and then re-index it

var getResponse = client.Get<Page>("1");

var page = getResponse.Source;

// update the last updated date 
page.LastUpdatedDate = DateTime.UtcNow;

var updateResponse = client.Update<Page>(page, u => u.Doc(page));

The first argument is the id for the document we want to get which can be inferred by NEST from the Page instance. Since we are passing the entire document back here, we could have just used .Index() instead of Update(), since we are updating all the fields

var indexResponse = client.Index(page);

However, since we only want to update the LastUpdatedDate, having to fetch the document from Elasticsearch, update it in the application, then send the document back to Elasticsearch is a lot of work. We can just send only the updated LastUpdatedDate to Elasticsearch instead using a partial document. C# anonymous types are really useful here

// model our partial document with an anonymous type. 
// Note that we need to use the snake casing name
// (NEST will still camel case the property names but this
//  doesn't help us here)
var lastUpdatedDate = new
{
    last_updated_date = DateTime.UtcNow
};

// do the partial update. 
// Page is TDocument, object is TPartialDocument
var partialUpdateResponse = client.Update<Page, object>("1", u => u
    .Doc(lastUpdatedDate)
);

We can use optimistic concurrency control here if we need to using RetryOnConflict(int)

var partialUpdateResponse = client.Update<Page, object>("1", u => u
    .Doc(lastUpdatedDate)
    .RetryOnConflict(1)
);

With a partial update, Elasticsearch will get the document, apply the partial update and then index the updated document; if the document changes between getting and updating, Elasticsearch is going to retry this once more based on RetryOnConflict(1).

Hope that helps :)

这篇关于ElasticSearch Nest插入/更新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆