长行的 Cassandra 性能 [英] Cassandra performance for long rows

查看:20
本文介绍了长行的 Cassandra 性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在考虑在 Cassandra 中实现一个具有很长行(每行数十万到数百万列)的 CF.

使用完全虚拟的数据,我在一行中插入了 200 万列(均匀间隔).如果我执行切片操作以获得 20 列,那么当您在行的更下方执行切片操作时,我会注意到性能大幅下降.

对于大多数列,我似乎能够在 10-40 毫秒内提供切片结果,但是当您接近行尾时,性能遇到瓶颈,响应时间从 1,800,000 时的 43 毫秒逐渐增加在 1,900,000 处标记为 214 毫秒,在 1,999,900 处标记为 435 毫秒!(所有切片的宽度相等).

我无法解释为什么在您到达行尾时性能会出现这种大幅下降.有人可以就 Cassandra 在内部做什么来造成这样的延迟提供一些指导吗?行缓存已关闭,几乎所有内容都是默认的 Cassandra 1.0 安装.

它应该能够支持多达每行 20 亿列,但以这种速度提高性能意味着它不能在实际情况下用于非常长的行.

非常感谢.

警告,我一次并行处理 10 个请求,这就是为什么它们比我预期的要慢一点,但这是对所有请求的公平测试,甚至只是在那里串行执行所有请求这是第 1,800,000 条记录和第 1,900,000 条记录之间的奇怪退化吗.

我还注意到,当每行只有 200,000 列时,仅对单个项目执行反向切片时性能非常差:query.setRange(end, start, false, 1);

解决方案

psanford 的评论让我找到了答案.事实证明,Cassandra <1.1.0(目前处于测试阶段)在 Memtables(尚未刷新到磁盘)中的长行上的切片上的性能很慢,但在使用相同数据刷新到磁盘的 SSTable 上性能更好.

http://mail-archives.apache.org/mod_mbox/cassandra-user/201201.mbox/%3CCAA_K6YvZ=vd=Bjk6BaEg41_r1gfjFaa63uNSXQKxgeB-oq2e5A@mail.gmail.com 和 <%3E="https://issues.apache.org/jira/browse/CASSANDRA-3545" rel="nofollow">https://issues.apache.org/jira/browse/CASSANDRA-3545.>

在我的示例中,前 180 万行已刷新到磁盘,因此该范围内的切片速度很快,但最后约 200,000 行尚未刷新到磁盘并且仍在内存表中.由于 memtables 在长行上切片很慢,这就是为什么我在行尾看到性能不佳的原因(我的数据是按列顺序插入的).

这可以通过在 cassandra 节点上手动调用刷新来解决.已将补丁应用于 1.1.0 以解决此问题,我可以确认这为我解决了问题.

我希望这能帮助其他有同样问题的人.

I'm looking at implementing a CF in Cassandra that has very long rows (hundreds of thousands to millions of columns per row).

Using entirely dummy data, I've inserted 2 million columns into a single row (evenly spaced). If I do a slice operation to get 20 columns, then I'm noticing a massive performance degradation as you do your slice operation further down the row.

With most of the columns, I seem to be able to serve up slice results in 10-40ms, but as you get towards the end of the row, performance hits the wall, with response times gradually increasing from 43ms at the 1,800,000 mark to 214ms at 1,900,000 and 435ms at 1,999,900! (All slices are of equal width).

I'm at a loss to explain why there is this massive degradation in performance as you get to the end of the row. Can someone please provide some guidance as to what Cassandra's doing internally to make such a delay? Row caching is turned off and pretty much everything is a default Cassandra 1.0 installation.

It's supposed to be able to support up to 2 billion columns per row, but at this rate of increase performance will mean that it can't be used for very long rows in a practical situation.

Many thanks.

Caveat, I'm hitting this with 10 requests in parallel at a time which is why they are a bit slower than I'd expect anyway, but it's a fair test across all requests and even just doing them all in serial there is this strange degradation between the 1,800,000th and 1,900,000th record.

I've also noticed EXTREMELY bad performance when doing reverse slices for just a single item when having just 200,000 columns per row: query.setRange(end, start, false, 1);

解决方案

psanford's comment led me to the answer. It turns out that Cassandra <1.1.0 (currently in beta) has slow performance on slices on long rows in Memtables (that have not been flushed to disk) but better performance on SSTables flushed to disk with the same data.

see http://mail-archives.apache.org/mod_mbox/cassandra-user/201201.mbox/%3CCAA_K6YvZ=vd=Bjk6BaEg41_r1gfjFaa63uNSXQKxgeB-oq2e5A@mail.gmail.com%3E and https://issues.apache.org/jira/browse/CASSANDRA-3545.

With my example, the first 1.8 million rows had been flushed to disk, so slices over that range were fast, but the last ~200,000 rows hadn't been flushed to disk and were still in memtables. As the memtables slicing is slow on long rows, this is why I saw bad performance at the end of the rows (my data was inserted in column order).

This can be fixed by manually calling a flush on the cassandra nodes. A patch has been applied to 1.1.0 to fix this and I can confirm that this fixes the issue for me.

I hope this helps anyone else with the same problem.

这篇关于长行的 Cassandra 性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆