卡桑德拉(Cassandra)无法存储超出分区大小限制的关系吗? [英] Is cassandra unable to store relationships that cross partition size limit?

查看:103
本文介绍了卡桑德拉(Cassandra)无法存储超出分区大小限制的关系吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到由于C *的100MB分区限制,关系无法正确存储在C *中,在这种情况下,非规范化无济于事,并且C *每个分区只能有2B个单元,而不能像2B个单元那样多头有16GB?!?!?那不是超过100MB的分区大小限制吗?



这是我通常不了解的内容,C *宣称它可以有2B单元,但是分区大小不可以跨100MB ???



惯用的方式是什么?人们说这是TitanDB或JanusDB的理想用例,可以很好地扩展数十亿个节点和边缘。这些在引擎盖数据模型下使用C *的数据库如何运行?



我的用例在此处



如果您使用cassandra作为存储后端。
在TitanDB或JanusDB中,为了有效操作和降低堆压力,应用相同规则,意味着顶点的边和属性的数量为100,000,大小为100MB


I've noticed that relationships cannot be properly stored in C* due to its 100MB partition limit, denormalization doesn't help in this case and the fact that C* can have 2B cells per partition neither as those 2B cells of just Longs have 16GB ?!?!? Doesn't that cross 100MB partition size limit ?

Which is what I don't understand in general, C* proclaims it can have 2B cells but a partition sizes should not cross 100MB ???

What is the idiomatic way to do this? People say that this an ideal use case for TitanDB or JanusDB that scale well with billions of nodes and edges. How do these databases that use C* under the hood data-model this?

The use case of mine is described here https://groups.google.com/forum/#!topic/janusgraph-users/kF2amGxBDCM

Note that I'm fully aware of the fact that the answer to this question is "use extra partition key to decrease partition size" but honestly, who of us has this possibility? Especially in modeling relationships ... I'm not interested in relationship that happened in a particular hour...

解决方案

Maximum number of cells (rows x columns) in a partition is 2 billion and single column value size is 2 GB ( 1 MB is recommended)

Source : http://docs.datastax.com/en/cql/3.1/cql/cql_reference/refLimits.html

Partition size 100MB is not the upper limit. If you check the datastax doc

For efficient operation, partitions must be sized within certain limits in Apache Cassandra™. Two measures of partition size are the number of values in a partition and the partition size on disk. Sizing the disk space is more complex, and involves the number of rows and the number of columns, primary key columns and static columns in each table. Each application will have different efficiency parameters, but a good rule of thumb is to keep the maximum number of rows below 100,000 items and the disk size under 100 MB

You can see that for efficient operation and low heap pressure they just made a good rule of thumb is to keep number of row 100,000 and disk size 100MB in a single partition.


TitanDB or JanusDB stores graphs in adjacency list format which means that a graph is stored as a collection of vertices with their adjacency list. The adjacency list of a vertex contains all of the vertex’s incident edges (and properties).

They used VertexID is the partition key, PropertyKeyID or EdgeID as clustering key and property value or edge properties as normal column.

If you use cassandra as storage backend. In TitanDB or JanusDB, For efficient operation and low heap pressure, same rule apply, means number of edge and property of a vertex is 100,000 and size 100MB

这篇关于卡桑德拉(Cassandra)无法存储超出分区大小限制的关系吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆