达到墓碑限制时究竟会发生什么 [英] What exactly happens when tombstone limit is reached

查看:27
本文介绍了达到墓碑限制时究竟会发生什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 cassandra 的日志(见下文),由于存在太多 tombstones,查询正在中止.发生这种情况是因为我每周清理(删除)一个计数器太低的行.这删除"了数十万行(用墓碑标记它们.)

According to cassandra's log (see below) queries are getting aborted due to too many tombstones being present. This is happening because once a week I cleanup (delete) rows with a counter that is too low. This 'deletes' hundreds of thousands of rows (marks them as such with a tombstone.)

如果在这个表中因为一个节点在清理过程中宕机而重新出现一个被删除的行,那完全不是问题,所以我为单个受影响的人设置了gc宽限时间table 到 10 小时(从默认的 10 天减少),因此可以相对较快地永久删除逻辑删除的行.

It is not at all a problem if, in this table, a deleted row re-appears because a node was down during the cleanup process, so I set the gc grace time for the single affected table to 10 hours (down from default 10 days) so the tombstoned rows can get permanently deleted relatively fast.

无论如何,我必须将 tombstone_failure_threshold 设置得非常高以避免以下异常.(一亿,从十万增加.)我的问题是,这有必要吗?我完全不知道什么类型的查询会被中止;插入、选择、删除?

Regardless, I had to set the tombstone_failure_threshold extremely high to avoid the below exception. (one hundred million, up from one hundred thousand.) My question is, is this necessary? I have absolutely no idea what type of queries get aborted; inserts, selects, deletes?

如果只是一些选择被中止,那没什么大不了的.但这是假设 abort 意味着上限",因为查询过早停止并返回它在找到太多墓碑之前设法收集的任何实时数据.

If it's merely some selects being aborted, it's not that big a deal. But that's assuming abort means 'capped' in that the query stops prematurely and returns whatever live data it managed to gather before too many tombstones were found.

好吧,问得更简单;当超过 tombstone_failure_threshold 时会发生什么?

Well, to ask it simpler; what happens when the tombstone_failure_threshold is exceeded?

INFO [HintedHandoff:36] 2014-02-12 17:44:22,355 HintedHandOffManager.java (line 323) Started hinted handoff for host: fb04ad4c-xxxx-4516-8569-xxxxxxxxx with IP: /XX.XX.XXX.XX
ERROR [HintedHandoff:36] 2014-02-12 17:44:22,667 SliceQueryFilter.java (line 200) Scanned over 100000 tombstones; query aborted (see tombstone_fail_threshold)
ERROR [HintedHandoff:36] 2014-02-12 17:44:22,668 CassandraDaemon.java (line 187) Exception in thread Thread[HintedHandoff:36,1,main]
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
    at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
    at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
    at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
    at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
    at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
    at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
    at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1516)
    at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1335)
    at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
    at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
    at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
    at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

忘记说了;运行 Cassandra 版本 2.0.4

Forgot to mention; running Cassandra version 2.0.4

推荐答案

当向 Cassandra 发出返回一系列行(或列)的查询时,它必须扫描表以收集结果集(这称为一片).现在,删除的数据以与常规数据相同的方式存储,除了它在被压缩之前被标记为墓碑.但是表阅读器仍然必须扫描它.因此,如果您周围有成吨的墓碑,您将有大量的工作要做,以满足您表面上有限的部分.

When a query that returns a range of rows (or columns) is issued to Cassandra, it has to scan the table to collect the result set (this is called a slice). Now, deleted data is stored in the same manner as regular data, except that it's marked as tombstoned until compacted away. But the table reader has to scan through it nevertheless. So if you have tons of tombstones lying around, you will have an arbitrarily large amount of work to do to satisfy your ostensibly limited slice.

一个具体的例子:假设您有两行带有聚集键 1 和 3,还有十万个带有聚集键 2 的死行位于表中的第 1 行和第 3 行之间.现在,当您发出 SELECT 查询,其中键是 >= 1 和 <3,您必须扫描 100002 行,而不是预期的两行.

A concrete example: let's say you have two rows with clustering keys 1 and 3, and a hundred thousand dead rows with clustering key 2 that are located in between rows 1 and 3 in the table. Now when you issue a SELECT query where the key is to be >= 1 and < 3, you'll have to scan 100002 rows, instead of the expected two.

更糟糕的是,Cassandra 不仅扫描这些行,而且还必须在准备响应时将它们累积在内存中.如果事情发生得太远,这可能会导致节点上的内存不足错误,并且如果多个节点正在为请求提供服务,它甚至可能导致导致整个集群瘫痪的多个故障.为防止发生这种情况,如果检测到危险数量的墓碑,该服务将中止查询.您可以随意启动它,但这是有风险的,如果您的 Cassandra 堆在这些峰值期间即将耗尽.

To make it worse, Cassandra doesn't just scan through these rows, but also has to accumulate them in memory while it prepares the response. This can cause an out-of-memory error on the node if things go too far out, and if multiple nodes are servicing the request, it may even cause a multiple failure bringing down the whole cluster. To prevent this from happening, the service aborts the query if it detects a dangerous number of tombstones. You're free to crank this up, but it's risky, if your Cassandra heap is close to running out during these spikes.

此异常是在最近的修复中引入的,首次在 2.0.2 中可用.这里是描述更改试图解决的问题的错误条目.以前一切都很好,直到您的一个节点或多个节点突然崩溃.

This exception was introduced in a recent fix, first available in 2.0.2. Here is the bug entry describing the problem the change was trying to address. Previously everything would have been just fine, until one of your nodes, or potentially several, suddenly crashed.

如果只是一些选择被中止,那没什么大不了的.但这是假设 abort 意味着上限",因为查询停止过早地返回它之前设法收集的任何实时数据发现了太多墓碑.

If it's merely some selects being aborted, it's not that big a deal. But that's assuming abort means 'capped' in that the query stops prematurely and returns whatever live data it managed to gather before too many tombstones were found.

查询不会返回有限集,它实际上完全丢弃了请求.如果您想缓解这种情况,也许值得以与宽限期相同的节奏进行批量行删除,这样您就不会每周都有大量的墓碑涌入.

The query doesn't return a limited set, it actually drops the request completely. If you'd like to mitigate, maybe it's worth doing your bulk row deletion at the same cadence as the grace period, so you don't have this huge influx of tombstones every week.

这篇关于达到墓碑限制时究竟会发生什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆