将分片从一台bigcouch服务器移动到另一台(用于平衡) [英] moving a shard from one bigcouch server to another (for balancing)

查看:69
本文介绍了将分片从一台bigcouch服务器移动到另一台(用于平衡)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在测试bigcouch来处理大量数据(每天1500万条记录)。



当我需要生成数据视图时,我会遇到一些平衡问题,因为我的两台机器中的一台比另一台弱得多。结果是,一台更好的机器完成了,而无所事事,而较弱的机器还有很多事要做。 (单核与双核)



我现在的想法是将一些碎片从较弱的机器转移到另一台,以便它们在大约同一时间完成。



因此,我的问题是,如何将分片从较早的Bigcouch服务器移至更好的服务器上?



谢谢您的帮助+致以最诚挚的问候!



Andy

解决方案

Bigcouch分片只是CouchDB数据库,因此移动它们的过程非常简单。 Bigcouch的将来版本将使该过程自动化,但是,现在,我将仅对其进行描述。


有一点背景知识将有助于进行解释。一个Bigcouch节点正在侦听5984和5986这两个端口。前端口5984看起来像CouchDB(同时处于群集状态且具有容错能力)。后端口5986直接与特定节点上的基础CouchDB服务器通信。您会注意到,除了数据库的分片之外,在localhost:5986 / _all_dbs中还显示了两个额外的数据库。一个称为节点,设置集群时您已经与之进行了交互。另一个称为 dbs,其中包含每个集群数据库的文档,指定数据库中每个分片的每个副本实际位于何处。


因此,要移动分片,您需要做一些事情;


  1. 标识分片文件。

  2. 将分片文件复制到新服务器。

    li>
  3. 告诉Bigcouch它的新位置。

  4. 在需要时进行复制。


步骤1


在Bigcouch节点的数据目录中,您将找到以下文件;


shards / a0000000-bfffffff / foo.1312544893.couch


所有分片都组织在shards /目录下,然后是范围,最后是名称,后跟


为数据库选择一个文件并记住其名称。


步骤2


使用任何方法将此文件复制到目标服务器上的相同路径。 rsync和scp是不错的选择,CouchDB复制也是如此(请确保从端口5986复制到端口5986)。


步骤3


文档需要修改用于控制群集数据库布局的 dbs中的 dbs。看起来有点像;


{ _ id: baz, _ rev: 1-912fe2dd63e0a570a4ceb26fd742dffd, shard_suffix:[46,49,51,49,50,53,52,53,50,49,55], changelog:[[添加, 00000000-7fffffff, dev1 @ 127.0.0.1],[添加, 80000000-ffffffff, dev1@127.0.0.1]], by_node:{ dev1@127.0.0.1:[ ; 00000000-7fffffff, 80000000-ffffffff], by_range:{ 00000000-7fffffff:[[ dev1@127.0.0.1]], 80000000-ffffffff:[[ dev1@127.0.0.1]}}


更新by_node和by_range值,以便您移动的分片解析为新主机。 / p>

此时,您已移动了分片。但是,如果自从开始复制文件以来但在更新 dbs文档之前进行了更新,则这些写操作发生在原始节点上并且不可见,因此应继续执行步骤4。如果没有更新,则说明可以删除原始服务器上的分片,尽管我建议您检查端口5984上的数据库,以确保所有文档都能正确显示。


步骤4


执行从源碎片到目标碎片的复制,再次注意在每个碎片的5986端口上执行此操作。这将确保所有更新再次可用。现在,您可以在原始服务器上删除此分片的副本。


HTH,
Robert Newson-Cloudant。


I'm currently testing bigcouch for big amounts of data (15 million records daily).

When I need to generate views of the data, I experience some balancing problems, because one of my two machines is much weaker than the other one. The result is, that the better machine is finished and has nothing to do while the weaker one has still a lot to do. (single- vs. dualcore)

My idea is now to move some shards from the weaker machine to the other one, so that they are finished at about the same time.

Therefore my question is, how can I move shards from the weeker bigcouch server to the better one?

Thank you for your help + best regards!

Andy

解决方案

Bigcouch shards are simply CouchDB databases so the procedure for moving them is pretty simple. A future release of Bigcouch will automate the process but, for now, I'll just describe it.

A little background will help ground the explanation. A Bigcouch node is listening on two ports, 5984 and 5986. The front port, 5984, looks like CouchDB (while being clustered and fault-tolerant). The back port, 5986, talks directly to the underlying CouchDB server on a particular node. You will notice that there are two extra databases shown in localhost:5986/_all_dbs besides the shards of your database. One is called 'nodes' and you have already interacted with it when you set up your cluster. The other is called 'dbs' and contains a document for each clustered database, specifying where each copy of each shard of your database actually lives.

So, to move a shard, you need to do a few things;

  1. Identity the shard file.
  2. Copy the shard file to your new server.
  3. Tell Bigcouch about its new location.
  4. Top off with replication if needed.

Step 1

In the data directory of your Bigcouch node, you will find files like this;

shards/a0000000-bfffffff/foo.1312544893.couch

All shards are organized under the shards/ directory, then by range, and finally the name followed by a random number.

Select one of the files for your database and remember its name.

Step 2

Use any method to copy this file to the same path on your target server. rsync and scp are fine choices, as is CouchDB replication (be sure to replicate from port 5986 to port 5986).

Step 3

The document in 'dbs' that governs the layout of your clustered database needs to be modified. It looks a bit like this;

{"_id":"baz","_rev":"1-912fe2dd63e0a570a4ceb26fd742dffd","shard_suffix": [46,49,51,49,50,53,52,53,50,49,55],"changelog":[["add","00000000-7fffffff","dev1@127.0.0.1"],["add","80000000-ffffffff","dev1@127.0.0.1"]],"by_node":{"dev1@127.0.0.1":["00000000-7fffffff","80000000-ffffffff"]},"by_range":{"00000000-7fffffff":["dev1@127.0.0.1"],"80000000-ffffffff":["dev1@127.0.0.1"]}}

Update both the by_node and by_range values so that the shard you have moved resolves to the new host.

At this point you have moved the shard. However, if there have been updates since you started copying the file but before you updated the 'dbs' document, those writes happened at the original node and are not visible so you should proceed to step 4. If there have been no updates, you can delete the shard on the original server, though I recommend you check your database on port 5984 to be sure all your docs show up correctly.

Step 4

Perform a replication from the source shard to the target shard, again taking care to do this on the 5986 port of each. This will ensure that all updates are available once again. You can now delete the copy of this shard on the original server.

HTH, Robert Newson - Cloudant.

这篇关于将分片从一台bigcouch服务器移动到另一台(用于平衡)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆