自动定义分片键以进行动态收集并提出设计建议 [英] Defining a shard-key automatically for a dynamic collection and suggestion on the design
问题描述
我想为我的MongoDb实现分片,并需要您的一些建议.
I want to implement Sharding for my MongoDb and need some of your suggestions.
洞察力
- 我们有很多cron-job,可收集有关机器的各种信息,将它们写入自己的收藏集.
- 集合是动态创建的.
- 每个集合都有数百万个数据. 每个集合的
- Structure1是名称",类别",子类别","NodeId",进程开始时间",进程结束时间",值".
- 每个集合的结构2是名称,类别,子类别,子类型,日期,值.
- 每个集合的结构3是名称,类别,子类别,NodeId,进程开始时间,进程结束时间,值,Flag1,Flag2,Flag3.
- We have lots of cron-job which collects various information about a machine & writes them to it's own collection.
- Collections are created dynamically.
- Each collection has millions of data.
- Structure1 for each collection is Name, Category, Subcategory, NodeId, Process-Start-Time, Process-End-Time, Value.
- Structure2 for each collection is Name, Category, Subcategory, Subtype, Date, Value.
- Structure3 for each collection is Name, Category, Subcategory, NodeId, Process-Start-Time, Process-End-Time, Value, Flag1, Flag2, Flag3.
经过研究,我们发现我们将使用分片并将其用于多台服务器,这样可以保证两件事:
After a research we found we will use sharding and make it useful with multiple servers which guarantees two things:
- 不用担心空间不足.
- 服务器之间的平衡性能
问题1: 我的问题是找到正确的分片来对数据进行分区.除了默认的ObjectId之外,我没有在集合中看到唯一键.进一步阅读后,我发现可以使用复合键,是否可以使用复合键或自定义ObjectId作为键,其值可能类似于ObjectId: _.就返回查询结果的性能而言,这是非常关键的.移动块.
Question 1: My problem is to find a correct shard-key to partition the data. I don't see a unique-key in the collection other than the default ObjectId. After further reading I have found that it is possible to use a composite key, does it make sense to have a composite key or custom ObjectId as a key where the value might look like ObjectId: _. This is very key with respect to performance of returning the results of a query & moving the chunks.
问题2: 由于我们有大量馆藏,因此每次动态创建馆藏时,每次在Mongo控制台中设置分片都将变得困难.有什么方法可以在mongo中使其自动化,以便每当为分片数据库创建一个集合时,它将为该集合定义分片键吗?
Question 2: Since we have large collections, it will become difficult to set the shard each time in Mongo console when a collection is created dynamically. Is there any way to make it automatic in mongo so that whenever a collection is created for a shard-database, it will define the shard-key for that collection?
问题3: 是否确实需要将分片键传递给查询表达式?我不认为我们在任何查询表达式中都使用过ObjectId,我怀疑我是否可以使用唯一的ID,因为事实是数据的结构不像传统的DB.如果是,它将如何帮助这样的查询:
Question 3: Is it really necessary to pass shard-key to the query expression? I don't think we have used ObjectId in any of our query-expression, I doubt I can come with a unique ID due to fact that the data is not structured like a traditional DB. If yes, how is it going to help for a query like this:
示例:
{类别:能源",子类别:瓦特",过程开始时间:{$ gte:132234234}}
{ category: "Energy", subcategory: "Watt", Process-Start-Time: {$gte: 132234234}}
在此先感谢您介入并帮助我解决此问题.
Thanks in advance for stepping in and helping me fix this problem.
推荐答案
最简单的方法可能是分片数据库,但不对集合进行分片.好处:
The easiest way to do this might be to shard the database, but leave the collections unsharded. Benefits:
- 集合将分布在各个分片上(但每个集合只能生活在一个分片上). 我对此有误,目前尚未实现.请参阅相关的吉拉门票进行跟踪.目前,您可以使用标签来分发收藏集,但不是自动的.
- 无需在每个新集合上调用shardCollection
- Collections will be distributed across the shards (but each collection will only live on one shard). I was wrong about this, this isn't implemented yet. See the related Jira ticket to track. For now, you can use tags to distribute collections, but not automatically.
- No need to call shardCollection on each new collection
缺点是集合的所有流量都将流向其碎片,这对于您尝试执行的操作可能是不切实际的.
The downside is that all traffic for a collection will go to its shard, which might be impractical for what you're trying to do.
关于您的问题:
问题1:分片键不必唯一.您通常要查询什么?最好使用{category:1}
或{category:1,subcategory:1}
之类的东西.
Question 1: Shard key does not have to be unique. What are you generally querying for? You might be better of with something like {category:1}
or {category:1,subcategory:1}
.
问题2:没有内置的方法可以自动执行此操作,获得此行为的最佳方法可能是设置cron作业.
Question 2: No built-in way to do it automatically, the best way to get that behavior is probably to set up a cron job.
问题3:否.包含分片密钥的查询可以发送到特定分片,而没有分片密钥的查询必须发送给所有分片,请参见
Question 3: No. Queries containing the shard key can be sent to specific shards and queries without the shard key must be sent to all shards, see http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-OperationTypes.
这篇关于自动定义分片键以进行动态收集并提出设计建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!