在网上恢复从网络分区 [英] Online mnesia recovery from network partition

查看:124
本文介绍了在网上恢复从网络分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以从mnesia群集中的网络分区中恢复,而不重新启动任何涉及到的节点?如果是这样,那么怎么办?

Is it possible to recover from a network partition in an mnesia cluster without restarting any of the nodes involved? If so, how does one go about it?

我有兴趣知道:


  • 如何使用标准的OTP mnesia(v4.4.7)完成此操作

  • 如果有任何需要编写的自定义代码来实现这一点(例如订阅mnesia running_paritioned_network事件,确定一个新的主人,合并记录从非主人到主人,强制从新的主人的负载表,清除运行parititioned网络事件 - 示例代码将不胜感激)。

  • 或者,该明信片绝对不支持在线恢复,并要求重新启动作为非主分区一部分的节点。

虽然我欣赏到通用分布式系统理论的指针,但在这个问题中,我只对erlang / OTP mnesia感兴趣。

While I appreciate the pointers to general distributed systems theory, in this question I am interested in erlang/OTP mnesia only.

推荐答案

p>经过一些实验,我发现了以下内容:

After some experimentation I've discovered the following:


  • Mnesia被认为是净如果在两个节点之间存在节点断开连接并重新连接而不重新启动mnesia,则工作将被分区。

  • 即使在断开连接时没有发生Mnesia读/写操作,也是如此。

  • 必须重新启动Mnesia才能清除分区网络事件 - 网络分区后,您不能 force_load_table

  • 只有Mnesia需要重新启动才能清除网络分区事件您不需要重新启动整个节点。

  • Mnesia通过使新重新启动的Mnesia节点用来自另一个Mnesia节点(启动表加载算法)的数据覆盖其表数据来解决网络分区。

  • <一般来说,节点将从最长时间的节点复制表(这是我看到的行为,我还没有验证这个显式编码,而不是别的副作用)。如果断开节点与群集的连接,请在两个分区(断开连接的节点及其旧的对等体)中进行写操作,关闭所有节点并重新启动它们,首先重新启动断开连接的节点,断开连接的节点将被视为主节点数据将覆盖所有其他节点。没有表比较/校验和/仲裁行为。
  • Mnesia considered the network to be partitioned if between two nodes there is a node disconnect and a reconnect without an mnesia restart.
  • This is true even if no Mnesia read/write operations occur during the time of the disconnection.
  • Mnesia itself must be restarted in order to clear the partitioned network event - you cannot force_load_table after the network is partitioned.
  • Only Mnesia needs to be restarted in order to clear the network partitioned event. You don't need to restart the entire node.
  • Mnesia resolves the network partitioning by having the newly restarted Mnesia node overwrite its table data with data from another Mnesia node (the startup table load algorithm).
  • Generally nodes will copy tables from the node that's been up the longest (this was the behaviour I saw, I haven't verified that this explicitly coded for and not a side-effect of something else). If you disconnect a node from a cluster, make writes in both partitions (the disconnected node and its old peers), shutdown all nodes and start them all back up again starting the disconnected node first, the disconnected node will be considered the master and its data will overwrite all the other nodes. There is no table comparison/checksumming/quorum behaviour.

所以为了回答我的问题,可以通过执行 mnesia:stop(),mnesia:start()在您决定丢弃数据的分区中的节点(我将称之为丢失的分区)。执行 mnesia:start()调用将导致节点与分区另一侧的节点联系。如果丢失分区中有多个节点,则可能需要将表加载的主节点设置为获胜分区中的节点 - 否则我认为有可能从丢失分区中的另一个节点加载表,因此返回分区网络状态。

So to answer my question, one can perform semi online recovery by executing mnesia:stop(), mnesia:start() on the nodes in the partition whose data you decide to discard (which I'll call the losing partition). Executing the mnesia:start() call will cause the node to contact the nodes on the other side of the partition. If you have more than one node in the losing partition, you may want to set the master nodes for table loading to nodes in the winning partition - otherwise I think there is a chance it will load tables from another node in the losing partition and thus return to the partitioned network state.

不幸的是,mnesia不支持在启动表加载阶段合并/调节表内容,也不提供返回表负载阶段一旦启动

Unfortunately mnesia provides no support for merging/reconciling table contents during the startup table load phase, nor does it provide for going back into the table load phase once started.

合并阶段特别适合ejabberd,因为节点仍然具有用户连接,从而知道其拥有的哪些用户记录/应该是最新的 - 日期(假设每个群集有一个用户连接)。如果存在合并阶段,则节点可以过滤用户数据表,保存连接用户的所有记录,按照常规加载表,然后将保存的记录写回mnesia集群。

A merge phase would be suitable for ejabberd in particular as the node would still have user connections and thus know which user records it owns/should be the most up-to-date for (assuming one user conneciton per cluster). If a merge phase existed, the node could filter userdata tables, save all records for connected users, load tables as per usual and then write the saved records back to the mnesia cluster.

这篇关于在网上恢复从网络分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆