如何使用Elasticsearch动态扩展写入和索引的大小? [英] How to scale write and index's size dynamically with Elasticsearch?

查看:165
本文介绍了如何使用Elasticsearch动态扩展写入和索引的大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在探索解决方案,以便存档和提供巨大的文档数据的网络搜索引擎。我首先开始搜索寻找搜索引擎解决方案,最后得出结论,当您需要处理大量数据时,Elasticsearch是最好的之一。我已经看到它容易扩展并且开箱即用,我确信。

I am currently exploring solutions in order to archive and provide a web search engine for enormous documentation data. I have firstly started my search looking for search engine solution and I end up with the conclusion that Elasticsearch was one of the best one when you have to deal with huge amount of data. I have read that it scale easily and out of the box and i was convinced.

然后我看了没有SQL数据库,并且由于演员的数量,我花了更多时间在我的搜索,我已经阅读了几个资源(没有SQL蒸馏,亚马逊Dynamo纸,谷歌BigTable纸等),使我更好地了解分布式系统。我也看到,大部分的No SQL可扩展数据库都有能力,当碎片变得太大时,可以在两个分片中自动分割碎片。

Then I looked about No SQL database and because of the number of actors, i spent more time on my searching and I have read several resources (No SQL distilled, Amazon Dynamo paper, Google BigTable paper, etc.) that led me to a better understanding of distributed system in general. I have also seen that most of the No SQL scalable databases have the ability to automatically split a shard in two shards when it becomes too big.

然后我意识到Elasticsearch不提供此功能。此外,相信文档: http:// www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html

Then I realize that Elasticsearch does not provide this feature. Moreover, believing to the documentation :http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html

我们不能增加分片数量的创作之后的一个指数。所以这带来了我的问题:

We can not increase the number of shards of an index after his creation. So this brings my questions :

假设您创建一个索引,指定预期流量/数据量的碎片数量,并且有一天您的期望被超过,你没有足够的分片来处理写请求和索引的大小,你如何处理这种情况?

推荐答案

我想我找到了一个方法,如果一个知道ElasticSearch的人能够证实这样做会很好,那会很好。

I think i found a way, if someone who knows ElasticSearch well can confirm it would work great, it would be nice.

我刚读过这篇文章,部分启发我这个想法:

I have just read this article and the last section inspire me this idea:

http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/

想法是创建两个别名(index_search和index_write),它指向开头的同一个索引(我们称之为index_1)。想象一下,index_1中的碎片数量是不够的,在这种情况下,我们可以使用相同的映射和分片数创建一个新的索引(我们称之为index_2),如果我们可以这样做。

The idea is to create two alias (index_search and index_write) which point at the beginning to the same index (let's call it index_1). Imagine one day the number of shard in index_1 isn't enough, in this case, we can create a new index (let's call it index_2) with the same mappings and with the number of shard, we would have added to the index_1 if we could have done it.

然后,我们更新别名index_search,使其指向index_1,index_2(index_1和index_2),就像搜索一样在两个指标上。然后,我们将index_write更新为index_2,因此只能在新的碎片上进行写入,因为index_1的碎片被认为是满的。

Then, we update the alias index_search to make it point to "index_1, index_2" (both index_1 and index_2), like that search will be made on the two index. Then, we update index_write to index_2 so write will be made only on the new shards because the shards of index_1 are considered full.

将来,我们可以添加一个新的index(index_3)和map index_search为index_1,index_2,index_3。

In the future, we could add a new index (index_3) and map index_search to "index_1, index_2, index_3".

当然在我们的应用程序中,我们总是使用别名,而不是索引的真实名称像这样,转换将不可见的应用程序,我们不必更改我们的应用程序的代码。

Of course in our application we would always use the alias and never the real name of the index like that, the transformation will be invisible for the application and we would not have to change the code of our application.

使用Sense语法的示例:

Example using Sense syntax :

PUT index_1
{
    "settings": {
        "number_of_shards": 1
    }
}

POST _aliases
{
    "actions": [
       {
          "add": {
             "index": "index_1",
             "alias": "index_search"
          }
       },
        {
          "add": {
             "index": "index_1",
             "alias": "index_write"
          }
       }
    ]
}

PUT index_write/article/1
{
    "title":"One first index",
    "article":"This is an article that is indexed on index_1"
}

PUT index_2
{
    "settings": {
        "number_of_shards": 2
    }
}

POST _aliases
{
    "actions": [
       {
          "add": {
             "index": "index_2",
             "alias": "index_search"
          }
       },
        {
          "add": {
             "index": "index_2",
             "alias": "index_write"
          }
       },
        {
          "remove": {
             "index": "index_1",
             "alias": "index_write"
          }
       }
    ]
}

PUT index_write/article/2
{
    "title":"One second index",
    "article":"This is an article that is indexed on index_2"
}

此解决方案的问题是如果您在index_1上更新文档,而在index_2上的index_write点,则会将其复制。这意味着您必须在更新之前搜索它才能找到是真正的索引。
此外,您不能使用id为index_write的GET命令。

The problem with this solution is if you update a document on index_1 while index_write point on index_2, it will make a copy of it. It means you will have to search it before update it in order to found is real index. Moreover you can not use the GET command with id one index_write.

这篇关于如何使用Elasticsearch动态扩展写入和索引的大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆