卡桑德拉(Cassandra)中的列过多 [英] Too many columns in Cassandra
问题描述
我在Cassandra的一个表中有20列。执行
I have 20 columns in a table in Cassandra. Will there be a performance impact in performing
select * from table where partitionKey = 'test';
我无法从此链接中了解
https://wiki.apache.org/cassandra/CassandraLimitations
1)Cassandra表中的列过多(例如20)会是什么结果?
1) What will be the consequence of having too many columns (say 20) in the Cassandra tables?
推荐答案
除非分区上有很多行,否则我认为20列不会有影响。如您链接的文档中所述:
Unless you have a lot of rows on the partition, I don't see an impact with having 20 columns. As stated in the documentation that you linked:
单个分区中的最大单元数(行x列)为20亿。 / p>
The maximum number of cells (rows x columns) in a single partition is 2 billion.
因此,除非您期望单个分区中有超过1亿行,否则我不明白为什么会有20列一个问题。请记住,Cassandra是专栏家庭商店。这个名称意味着Cassandra可以在每个分区中存储大量列。
So, unless you are expecting to have more than 100 million rows in a single partition, I don't see why 20 columns would be an issue. Keep in mind that Cassandra is a column family store. This designation means that Cassandra can store a large number of columns per partition.
我曾经说过,我个人建议每个分区不要超过100 MB。
Having said that, I would personally recommend not to go over 100 MB per partition. It might bring you problems in the future with streaming during repairs.
====================== ========
===============================
回答您的评论。请记住,在Cassandra中,分区和行是两个不同的东西。如果没有聚类列,则分区仅等于一行。例如,查看此表的创建和我们插入的值,然后查看sstabledump:
To answer to your comment. Keep in mind that partitions and rows are 2 different things in Cassandra. A partition is only equal to a row if there's no clustering columns. For instance, take a look at this table creation and the values we insert, and then look at the sstabledump:
create TABLE tt2 ( foo int , bar int , mar int , PRIMARY KEY (foo , bar )) ;
insert INTO tt2 (foo , bar , mar ) VALUES ( 1, 2, 3) ;
insert INTO tt2 (foo , bar , mar ) VALUES ( 1, 3, 4) ;
稳定转储:
./cassandra/tools/bin/sstabledump ~/cassandra/data/data/tk/tt2-1386f69005bd11e89c0bbfb5c1157523/mc-1-big-Data.db
[
{
"partition" : {
"key" : [ "1" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 32,
"clustering" : [ "2" ],
"liveness_info" : { "tstamp" : "2018-01-30T12:57:36.362483Z" },
"cells" : [
{ "name" : "mar", "value" : 3 }
]
},
{
"type" : "row",
"position" : 32,
"clustering" : [ "3" ],
"liveness_info" : { "tstamp" : "2018-01-30T12:58:03.538482Z" },
"cells" : [
{ "name" : "mar", "value" : 4 }
]
}
]
}
]
此外,如果您使用 -d
选项,则会迁移ht使您更容易查看内部表示。如您所见,对于同一个分区,我们有2个不同的行:
Also, if you use the -d
option, it might make it easier for you to see the internal representation. As you can see, for the same partition, we have 2 distinct rows:
./cassandra/tools/bin/sstabledump -d ~/cassandra/data/data/tk/tt2-1386f69005bd11e89c0bbfb5c1157523/mc-1-big-Data.db
[1]@0 Row[info=[ts=1517317056362483] ]: 2 | [mar=3 ts=1517317056362483]
[1]@32 Row[info=[ts=1517317083538482] ]: 3 | [mar=4 ts=1517317083538482]
这篇关于卡桑德拉(Cassandra)中的列过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!