Cassandra查询失败(逻辑删除) [英] Cassandra query failure (Tombstones)

查看:147
本文介绍了Cassandra查询失败(逻辑删除)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这让我发疯.我试图查询我在Cassandra中的表之一,它显示查询失败.我试图挖掘其背后的原因,发现那是因为墓碑.我将GC_GRACE_SECONDS更改为零,并使用nodetool触发了压缩,当我再次查询它时,它工作正常.但是,在随后的呼叫中,由于相同的原因,查询再次失败.我正在使用cassandra-nodejs驱动程序. 这是我的数据模型.

So this is driving me crazy. i tried querying one of my table in Cassandra and it showed query failure. i tried digging dip in to the reason behind it and found that it was because of tombstone. i changed GC_GRACE_SECONDS to Zero and triggered Compaction using nodetool, And when i queried again it worked fine. however on a subsequent calls query failed again with a same reason. i am using cassandra-nodejs driver. This is my data model.

CREATE TABLE my_table (
    firstname text,
    lastname text,
    email text,
    mobile text,
    date timeuuid,
    value float,
    PRIMARY KEY (firstname, lastname, email, mobile)
) WITH CLUSTERING ORDER BY (lastname ASC, email ASC, mobile ASC);

这是我要对该数据模型执行的查询.

this is the query i want to perform on that data model.

SELECT firstname, email, toDate(date) as date, mobile, value FROM my_table  WHERE date >= minTimeuuid('2017-03-25 00:00:00+0000') AND date <= minTimeuuid('2017-03-28 23:59:59+0000') ALLOW FILTERING;

结果将有大约40k行. 表明,如果我们删除了某些东西,它将被标记为逻辑删除,并且在为给定表设置了GC_GRACE_SECONDS之后将被删除.如果我正确理解的话.

the result will have approx 40k rows. this shows that if we delete something it will be marked as tombstone and will get deleted After GC_GRACE_SECONDS setted for given table. If i understand it correctly then.

  1. 当我从不删除表的任何行时,怎么会有逻辑删除问题?
  2. 是否只有当我们删除一行时,行才会被标记为墓碑?
  3. 清理墓碑,然后查询相同的作品,有时却不行,为什么呢?
  4. 增加 tombstone_failure_threshold 值是个好主意吗? (单节点集群应用程序)
  1. how come there be tombstone problem when i never delete any row of table?
  2. Is that true a row will be marked as Tombstone if and only if we delete a row?
  3. clearing tombstones and then querying the same works sometimes and sometimes it does not, why is so?.
  4. is it a good idea to increase tombstone_failure_threshold value? (single node cluster application)

我正在使用cassandra 3.5和cqlsh版本5.0.1.该查询在终端上运行正常,但是当我们使用外部客户端执行时(使用针对Cassandra的nodejs驱动程序来表达应用程序)会出错.我有一个单节点群集应用程序.

I am using cassandra 3.5, with cqlsh version 5.0.1. And the query works fine with terminal, but gives error when we execute using external client (express app using nodejs driver for cassandra). i have a single node cluster app.

这是我在字段中插入的空值的日志(我仅插入名称和时间戳);

This is the log of my Inserted null value in field (i inserted only name and timestamp);

  activity                                                                                        | timestamp                  | source        | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
                                                                              Execute CQL3 query | 2017-03-29 10:28:27.342000 | 172.31.34.179 |              0
                   Parsing select * FROM testtomb WHERE name = 'Dhaval45'; [SharedPool-Worker-2] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |             64
                                                       Preparing statement [SharedPool-Worker-2] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |            101
                              Executing single-partition query on testtomb [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |            210
                                              Acquiring sstable references [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |            223
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |            243
                                 Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 |            288
                                         Read 2 live and 0 tombstone cells [SharedPool-Worker-3] | 2017-03-29 10:28:27.342001 | 172.31.34.179 |            310
                                 Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:28:27.342001 | 172.31.34.179 |            323
                                                                                Request complete | 2017-03-29 10:28:27.342385 | 172.31.34.179 |            385

这是当我查询已执行删除查询的字段时的日志.最初,用户 Dhaval15 的名字为"aaaa",然后是单元格aaa.然后再次在同一用户上执行选择查询就给了我这个日志.

And this is the log when i query on filed which i have executed a delete query. Initially user Dhaval15 has firstname 'aaaa' and then i the cell aaa. then again executing select query on same user gave me this log.

       activity                                                                                        | timestamp                  | source        | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
                                                                              Execute CQL3 query | 2017-03-29 10:35:18.581000 | 172.31.34.179 |              0
                   Parsing select * FROM testtomb WHERE name = 'Dhaval15'; [SharedPool-Worker-1] | 2017-03-29 10:35:18.581000 | 172.31.34.179 |             65
                                                       Preparing statement [SharedPool-Worker-1] | 2017-03-29 10:35:18.581000 | 172.31.34.179 |            113
                              Executing single-partition query on testtomb [SharedPool-Worker-3] | 2017-03-29 10:35:18.581000 | 172.31.34.179 |            223
                                              Acquiring sstable references [SharedPool-Worker-3] | 2017-03-29 10:35:18.581000 | 172.31.34.179 |            235
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2017-03-29 10:35:18.581000 | 172.31.34.179 |            256
                                 Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:35:18.581001 | 172.31.34.179 |            305
                                         Read 1 live and 1 tombstone cells [SharedPool-Worker-3] | 2017-03-29 10:35:18.581001 | 172.31.34.179 |            338
                                 Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:35:18.581001 | 172.31.34.179 |            351
                                                                                Request complete | 2017-03-29 10:35:18.581430 | 172.31.34.179 |            430

推荐答案

在插入空值时,即使不执行删除查询,也会在Cassandra逻辑删除中创建.

In Cassandra tombstone created even if you don't execute delete query, when you insert null value.

墓碑消耗空间.执行选择查询时,cassandra需要按逻辑删除过滤数据.如果生成了巨大的逻辑删除,则您选择的查询性能将下降.

Tombstone consume space. When you execute select query cassandra needs to filter out data by tombstone. If huge tombstone generated your select query performance will degrade.

由于巨大的墓碑和ALLOW FILTERING,您的查询失败.不要在生产中使用ALLOW FILTERING.这非常昂贵.在不指定分区键的情况下执行查询时,Cassandra需要扫描所有节点的所有行.

Your query failing because of huge tombstone and ALLOW FILTERING. Don't use ALLOW FILTERING on production. it's very costy. When you execute query without specifying partition key, Cassandra needs to scan all the row of all the nodes.

将数据模型更改为如下所示:

Change your data model to like the below one :

CREATE TABLE my_table (
    year int,
    month int,
    date timeuuid,
    email text,
    firstname text,
    lastname text,
    mobile text,
    value float,
    PRIMARY KEY ((year, month), date)
);

您可以在此处指定日期中的年和月摘录.
现在,您可以使用指定分区键进行查询:

Here you can specify year and month extract from the date.
Now you can query with specifying partition key :

SELECT * FROM my_table WHERE year = 2017 AND month = 03 AND date >= minTimeuuid('2017-03-25 00:00:00+0000') AND date <= minTimeuuid('2017-03-28 23:59:59+0000') ;

这将非常有效地返回结果,并且不会失败.

This will return result very efficiently and will not fail.

如果您需要用名字和姓氏进行查询,请在它们上创建一个索引

If you need to query with firstname and lastname create an index on them

CREATE INDEX index_firstname ON my_table (firstname) ;
CREATE INDEX index_lastname ON my_table (lastname) ;

然后您可以用名字或姓氏查询

Then you can query with firstname or last name

SELECT * FROM my_table WHERE firstname = 'ashraful' ;
SELECT * FROM my_table WHERE lastname  = 'islam' ;

由于高基数问题,这里我还没有在电子邮件和电话上创建索引.而是创建实体化视图或其他表格以通过电话或电子邮件进行查询

Here i have not create index on email and phone because of high cardinality problem. Instead create materialized view or another table to query with phone or email

CREATE MATERIALIZED VIEW mview_mobile AS
    SELECT *
    FROM my_table
    WHERE mobile IS NOT NULL AND year IS NOT NULL AND month IS NOT NULL AND date IS NOT NULL
    PRIMARY KEY (mobile, year, month, date);


CREATE MATERIALIZED VIEW mview_email AS
        SELECT *
        FROM my_table
        WHERE email IS NOT NULL AND year IS NOT NULL AND month IS NOT NULL AND date IS NOT NULL
        PRIMARY KEY (email, year, month, date);

现在您可以通过电话或电子邮件查询

Now you can query with phone or email

SELECT * FROM mview_mobile WHERE mobile = '018..';
SELECT * FROM mview_email WHERE email = 'ashraful@...';

有关卡桑德拉墓碑的更多信息: http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

More about cassandra tombstone : http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

这篇关于Cassandra查询失败(逻辑删除)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆