建议在分片中使用以下哪个数据复制选项? [英] Which of the following data duplication options across shards is recommended?

查看:115
本文介绍了建议在分片中使用以下哪个数据复制选项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

高性能mysql 本书建议在对博客应用程序进行分片时,可能需要在2个分片上放置评论数据:首先,在发布评论的人的分片上,以及在存储帖子的分片上.

High performance mysql book suggests that for sharding a blog application, one may want to put comments data across 2 shards: first, on the shard of a person posting comment, and on the shard where the post is stored.

因此,这引发了一个问题,即如何可靠地复制此数据.建议在分片中使用以下哪个数据复制选项?

选项1:从PHP脚本中进行两次插入.
优点: a)逻辑在应用程序层中.
缺点: a)用户被两次插入. b)在每个尝试插入相似数据的客户端中,都需要复制此逻辑.
结论:似乎合理.

So this raises the question how to reliably duplicate this data. Which of the following data duplication options across shards is recommended?

Option 1: Make 2 separate inserts from the PHP script.
Pros: a) Logic is in application layer.
Cons: a) User is held for 2 inserts. b) This logic will need to be duplicated in every client trying to insert similar data.
Conclusion: Seems reasonable.

选项2:形成联合表,并使用一些触发器来处理重复项的插入.
优点: a)应用层无需担心多次插入
缺点: a)每个分片都需要与其他每个分片建立联合连接; b)联合会在局域网中的计算机上工作,但是在2个不同的站点上如何呢? c)如果与联合服务器的连接失败怎么办.
结论:似乎不是一个好主意.

Option 2: Form federated tables and use some trigger to handle the insert of duplicate.
Pros: a) App layer doesn't need to worry about multiple inserts
Cons: a) Every shard need to have federated connection to every other shard; b) Federation will work on machines in LAN, but what about at 2 different sites. c) what if connection to federated server fails.
Conclusion: Doesn't seem like a sound idea.

选项3:消息,例如RabbitMQ
优点: a)不同的客户端可以在一个位置插入数据,并且所有订户都可以使用该插入内容.
缺点: a)复杂; b)可能会产生开销,以便托管消息传递服务器和客户端; c)不确定如何与查找服务一起找到合适的碎片
结论:不确定

Option 3: Messaging such as RabbitMQ
Pros: a) Different clients can insert data at one place, and all subscribers can consume the insert.
Cons: a) Complex; b) may impose overhead in order to host messaging server, and clients; c) not sure how will it work with a look-up service to locate appropriate shards
Conclusion: Not sure

选项4:您的建议?

非常感谢您的帮助.

推荐答案

正如您所指出的那样,在各个分片之间使用触发器是很愚蠢的.分片的全部原因是独立的数据库 操作.因此您可以立即将其丢弃.

As you point out, having triggers between the various shards is silly; the whole reason for sharding is independent database operations. So you can throw it out right away.

同时更新两个表是最少的方法 移动部件.从长远来看,它将是最可维护的. 如果出现问题,这将是最简单的调试.

Updating both tables at the same time is the approach with the fewest moving parts. Over the long term, it will be the most maintainable. And it will be the easiest to debug if something goes wrong.

但是,如果响应时间很重要,那么您可能会想到某种 消息传递方法:更新按条目注释表,并排队 消息以更新按用户注释"表.如果要花一个小时 该消息要处理-或在系统崩溃中丢失 -没什么大不了的,您可以随时恢复.绝对不要使用消息传递方法来更新两个表.

But if response time is important, then you might think of some sort of messaging approach: update the comments-by-entry table, and queue a message to update the comments-by-user table. If it takes an hour for that message to be processed -- or if it gets lost in a system crash -- no big deal, you can always recover. By no means should you use a messaging approach to update both tables.

回答者: @kdgregory 链接:https://softwareengineering.stackexchange.com/a/134607/41398

这篇关于建议在分片中使用以下哪个数据复制选项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆