限制请求结果的数量时,Cassandra是否读取整行? [英] Does Cassandra read the whole row when limiting the number of requested results?

查看:122
本文介绍了限制请求结果的数量时,Cassandra是否读取整行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用cassandra 2.0.6.并有此表:

I am using cassandra 2.0.6. and have this table:

CREATE TABLE t (
    id text,
    idx bigint,
    data bigint,
    PRIMARY KEY (id, idx)
)

所以说我得到了这些行:

So say I got these rows:

id / idx / data
x    1     data1
x    2     data2
x    3     data3

....继续说x有1000行

.... goes on say 1000 rows for x

如果我查询:

select * from t where id='x' order by idx limit 1

卡桑德拉(Cassandra)会获取全部1000行,还是仅获取其中的一小部分?

Will cassandra fetch all the 1000 rows , or only a small part of it?

阅读类似 http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2/#.UzrvLKZx2PI ,似乎只会获取其中的一小部分.但是运行一些压力测试以及表中包含的数据越多,我获得的MB/sec磁盘IO就越多.

Reading articles like http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2/#.UzrvLKZx2PI , it seems it will fetch only a small part of it. But running some stress tests and the more data I have in the table, the more MB/sec disk IO I get.

对于8GB的数据,我获得了3MB/秒的IO(读取) 对于12GB的数据,我获得了15MB/秒的IO(读取) 对于20GB的数据,我目前获得35MB/秒的IO(读取)

For 8GB of data I was getting 3MB/sec IO (reads) For 12GB of data I was getting 15MB/sec IO (reads) For 20GB of data, I am currently getting 35MB/sec IO (reads)

我在cfhistograms中看不到任何奇怪的东西:

I don't see anything weird in cfhistograms:

SSTables per Read
1 sstables: 421010
2 sstables: 552
3 sstables: 9
4 sstables: 0
5 sstables: 254
6 sstables: 3221
7 sstables: 3063
8 sstables: 1029
10 sstables: 143

Read Latency (microseconds)
12 us: 6
14 us: 36
17 us: 471
20 us: 2795
24 us: 10799
29 us: 18594
35 us: 24693
42 us: 43078
50 us: 67438
60 us: 68872
72 us: 70718
86 us: 47300
103 us: 23471
124 us: 11752
149 us: 4509
179 us: 1437
215 us: 832
258 us: 3444
310 us: 7883
372 us: 2374
446 us: 736
535 us: 624
642 us: 581
770 us: 1875
924 us: 1715
1109 us: 2889
1331 us: 3705
1597 us: 2197
1916 us: 1320
2299 us: 826
2759 us: 639
3311 us: 431
3973 us: 312
4768 us: 213
5722 us: 106
6866 us: 72
8239 us: 44
9887 us: 36
11864 us: 25
14237 us: 16
17084 us: 23
20501 us: 20
24601 us: 15
29521 us: 28
35425 us: 21
42510 us: 20
51012 us: 49
61214 us: 49
73457 us: 29
88148 us: 23
105778 us: 35
126934 us: 23
152321 us: 17
182785 us: 13
219342 us: 10
263210 us: 8
315852 us: 3
379022 us: 8
454826 us: 10

推荐答案

完成集群排序后,现在就可以节省订购时间.如果您面临大量数据的问题,这可能是由于使用了压缩策略.我觉得您在读取重柱系列上使用了按大小分层的压缩策略.使用分层压缩"策略尝试相同的方案.

Once you have done the clustering order , your ordering time is saved now. If you are facing problem with large amounts of data, it will be due to the compaction strategy used. I feel you are using a size tiered compaction strategy on read heavy column family. Try the same scenario with Leveled compaction strategy.

使用大小分层压缩时,您将数据分布在多个马stable中,并且每次都必定会获取数据.因此,一个读取重的列族对此并不满意.

When you use size tiered compaction, you are spreading your data across multiple stables and you are bound to get data out of all each time. So , a read heavy column family doesn't bode well with this.

这篇关于限制请求结果的数量时,Cassandra是否读取整行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆