计数列，CountQuery vs SliceQuery操作非常慢 [英] Counting columns, very slow CountQuery vs SliceQuery operations

查看：62 发布时间：2021/4/21 19:37:58 cassandra hector pelops

本文介绍了计数列，CountQuery vs SliceQuery操作非常慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我编写了一个普查"程序来遍历Column Family中的所有行，并在每一行中对列进行计数，记录最大值和行键.我一直在花更多的时间与赫克托(Hector)客户在一起，但也写了一个Pelops客户来进行测试.

I've written a "census" program to iterate through all the rows in a Column Family and within each row count the columns, recording the max value and row key. I've been spending more time with the Hector client but have written a Pelops client as well to test.

基本流程是使用RangeSlicesQuery遍历行，然后在每一行使用SliceQuery遍历并收集统计信息.在Pelops中工作类似，只是API不同.缺点是必须手动进行缓冲，同时选择行和列的缓冲区大小...我的当前数据是1200万行，最大列数约为25K，所以是的，要花点时间...在我的当前配置中，每秒> 25,000行.

The basic flow is to use use a RangeSlicesQuery to iterate through the rows, and then at each row, use a SliceQuery to iterate through and collect the stats. Works similar in Pelops, just different APIs. Downside is having to do the buffering manually, picking buffer sizes for both rows and columns... My current data is 12 million rows, with largest column count ~25K, so yeah takes a while... in my current configuration, am getting >25K rows per second.

寻找改善和发现赫克托(Hector)的CountQuery的方法(我认为，该方法使用Thrift客户端get_count()).认为仅迭代键(使用RangeSlicesQuery.setReturnKeysOnly())，然后在每个行键上重新使用CountQuery会更快，我修改了代码.

Looking for ways to improve and discovered Hector's CountQuery (which I assume, uses Thrift client get_count()). Thinking it would be faster to just iterate keys (use RangeSlicesQuery.setReturnKeysOnly()), and then re-use a CountQuery on each row key, I revised the code.

不仅速度变慢了，而且还慢了30倍！(每秒仅处理900行)...

Not only was it slower, but 30x slower! (processed only 900 rows per second)...

是否有更好的方法来计数列?

Is there a better way to count columns?

计数列，CountQuery vs SliceQuery操作非常慢 [英] Counting columns, very slow CountQuery vs SliceQuery operations

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计数列，CountQuery vs SliceQuery操作非常慢 [英] Counting columns, very slow CountQuery vs SliceQuery operations

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭