卡桑德拉(Cassandra)中带有TTL列的大量墓碑 [英] High number of tombstones with TTL columns in Cassandra

查看:118
本文介绍了卡桑德拉(Cassandra)中带有TTL列的大量墓碑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下架构的cassandra列族或CQL表:

 创建表user_actions(
company_id varchar,
employee_id varchar,
insert_at timeuuid,
action_type varchar,
PRIMARY KEY((company_id,employee_id),insert_at)
)含排序顺序(insertted_at DESC);

基本上是由公司ID和员工ID以及集群组成的复合分区键列,表示插入时间,用于按相反的时间顺序对列进行排序(最新操作在行的开头)。



这里是插入内容像这样:

  INSERT INTO user_actions(company_id,employee_id,insert_at,action_type)
VALUES( acme, xyz ',now(),'started_project')
使用TTL 1209600; //两周

这里没有什么特别之处,除了TTL设置为两周后到期。 / p>

读取路径也很简单-我们总是希望最近执行100次操作,因此看起来像这样:

 从user_actions 
中选择action_type WHERE company_id ='acme'和employee_id ='xyz'
LIMIT 100;

问题:我希望这是因为我们按相反的时间顺序排序,而TTL始终是插入的秒数相同-这样的查询不应扫描任何墓碑-所有死列都位于行尾,而不是头。但实际上,我们在日志中看到以下格式的许多警告:

  WARN [ReadStage:60452] 2014-09-08 09:48:51,259 SliceQueryFilter.java(第225行)读取profile.user_actions中的40个实时和1164个逻辑删除的单元格(请参见tombstone_warn_threshold)。请求了100列,slices = [-],delInfo = {deletedAt = 1410169639669000,localDeletion = 1410169639} 

在极少数情况下,逻辑删除号足够大,可以完全中止查询。
既然我经常看到这种模式设计被提倡,我想知道我在这里做错什么了吗?

解决方案

您的SELECT语句未给出明确的排序顺序,因此默认为ASC(即使您的聚类顺序为DESC)。



因此,如果您更改查询到:

 从user_actions 
中选择action_type WHERE company_id ='acme'和employee_id ='xyz'
ORDER BY insert_at DESC
LIMIT 100;

您应该没事


I have a cassandra Column Family, or CQL table with the following schema:

CREATE TABLE user_actions (
    company_id varchar,
    employee_id varchar,
    inserted_at timeuuid,
    action_type varchar,
    PRIMARY KEY ((company_id, employee_id), inserted_at)
) WITH CLUSTERING ORDER BY (inserted_at DESC);

Basically a composite partition key that is made up of a company ID and an employee ID, and a clustering column, representing the insertion time, that is used to order the columns in reverse chronological order (newest actions are at the beginning of the row).

Here's what an insert looks like:

INSERT INTO user_actions (company_id, employee_id, inserted_at, action_type)
VALUES ('acme', 'xyz', now(), 'started_project')
USING TTL 1209600; // two weeks

Nothing special here, except the TTL which is set to expire in two weeks.

The read path is also quite simple - we always want the latest 100 actions, so it looks like this:

SELECT action_type FROM user_actions
WHERE company_id = 'acme' and employee_id = 'xyz'
LIMIT 100;

The issue: I would expect that since we order in reverse chronological order, and the TTL is always the same amount of seconds on insertion - that such a query should not scan through any tombstones - all "dead" columns are at the tail of the row, not the head. But in practice we see many warnings in the log in the following format:

WARN [ReadStage:60452] 2014-09-08 09:48:51,259 SliceQueryFilter.java (line 225) Read 40 live and 1164 tombstoned cells in profiles.user_actions (see tombstone_warn_threshold). 100 columns was requested, slices=[-], delInfo={deletedAt=1410169639669000, localDeletion=1410169639}

and on rare occasions the tombstone number is large enough to abort the query completely. Since I see this type of schema design being advocated quite often, I wonder if I'm doing something wrong here?

解决方案

Your SELECT statement is not giving an explicit sort order and is hence defaulting to ASC (even though your clustering order is DESC).

So if you change your query to:

SELECT action_type FROM user_actions
WHERE company_id = 'acme' and employee_id = 'xyz'
ORDER BY inserted_at DESC
LIMIT 100;

you should be fine

这篇关于卡桑德拉(Cassandra)中带有TTL列的大量墓碑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆