在Cassandra中存储最后触摸时间的最佳方法 [英] Best way to store last-touched time in Cassandra

查看:100
本文介绍了在Cassandra中存储最后触摸时间的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将最后触摸时间存储在Postgres的User表中,但是有许多频繁的更新和足够的争用,我可以看到其中3个相同更新的示例陷入僵局。



Cassandra似乎更适合于此-但我是否应该为此目的奉上一张桌子?而且我不需要旧的时间戳,只需最新的。我应该使用Cassandra以外的其他东西吗?
如果我应该使用Cassandra,关于表属性的任何提示吗?



我要记住的表:

 创建表ksp1.user_last_job_activities(
user_id bigint,
touched_at timeuuid,
主键(user_id,touched_at)
) ORDER BY(touched_at DESC)
和bloom_filter_fp_chance = 0.01
AND缓存='{ keys: ALL, rows_per_partition: NONE}'
AND comment =''
AND压缩= {'min_threshold':'4','class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy','max_threshold':'32'}
AND压缩= {' sstable_compression':'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = ‘99 .0PERCENTILE’;

更新



<谢谢!我在写时间周围做了一些实验,因为无论如何我都要写一个值,所以我只是写时间。



就像这样:

 创建表simple_user_last_activity(
user_id bigint,
touched_at时间戳,
主键(user_id)
);

然后:

  INSERT INTO simple_user_last_activity(user_id,touched_at)VALUES(6,dateof(now())); 
从simple_user_last_activity中选择touched_at,其中user_id = 6;

由于touched_at不再位于主键中,因此每个用户仅存储一条记录。



更新2



我还有一个选择。我也可以存储job_id,它为分析提供了更多数据:

 创建表final_user_last_job_activities(
user_id bigint,
touched_at时间戳,
job_id bigint,
主键(user_id,touched_at)

使用排序顺序(touched_at DESC)
并且default_time_to_live = 604800;

添加1周的TTL可以使记录过期-如果没有记录,我将返回当前时间。

  INSERT INTO final_user_last_job_activities(user_id,touched_at,job_id)值(5,dateof(now()),5); 
INSERT INTO final_user_last_job_activities(user_id,touched_at,job_id)VALUES(5,dateof(now()),6);
INSERT INTO final_user_last_job_activities(user_id,touched_at,job_id)VALUES(5,dateof(now()),7);
INSERT INTO final_user_last_job_activities(user_id,touched_at,job_id)VALUES(5,dateof(now()),6);

SELECT * FROM final_user_last_job_activities LIMIT 1;

哪个给我:

  user_id | touched_at | job_id 
--------- + -------------------------- + --------
5 | 2015-06-17 12:43:30 + 1200 | 6

简单基准测试在存储或读取较大表中没有显着的性能差异。

解决方案

由于c *是最后写入的胜利,因此您只需保留每行的最新版本即可。



您可以按照MSD的建议,使用 writetime 来缩短写入时间。但是要小心,因为这是特定于列的,并且您不能在主键列上使用写入时间。例如,在下表中:

  cqlsh>创建表test.test(a int,b int,c int,d int,主键(a))
...;
cqlsh>插入INTO test.test(a,b,c,d)值(1,2,3,4)
...;

cqlsh>从test.test
...中选择*;

a | b | c | d
--- + ------ + --- + ------
1 | 2 | 3 | 4

(2行)

cqlsh>插入test.test(a,c)值(1、6);
cqlsh>从test.test中选择*;

a | b | c | d
--- + ------ + --- + ------
1 | 2 | 6 | 4

(2行)
cqlsh>从test.test
...中选择writetime(a),writetime(b),writetime(c),writetime(d);
InvalidRequest:代码= 2200 [无效查询] message =不能在主键部分a上使用选择功能writeTime

cqlsh>从test.test中选择writetime(b),writetime(c),writetime(d);

写入时间(b)| writetime(c)| writetime(d)
------------------ + ------------------ + ---- --------------
1434424690700887 | 1434424690700887 | 1434424702420929

否则,您可以添加带有时间戳的cql列:

 创建TABLE test.test(a int,b int,c int,d int,touched_at timeuuid,主键(a)); 

一些快速基准测试可以帮助您确定哪个性能更好。


I'm storing a last-touched time in a User table in Postgres, but there are many frequent updates and enough contention that I can see examples of 3 of the same updates deadlocking.

Cassandra seems a better fit for this - but should I devote a table to just this purpose? And I don't need old timestamps, just the latest. Should I use something other than Cassandra? If I should use Cassandra, any tips on table properties?

The table I have in mind:

CREATE TABLE ksp1.user_last_job_activities (
    user_id bigint,
    touched_at timeuuid,
    PRIMARY KEY (user_id, touched_at)
) WITH CLUSTERING ORDER BY (touched_at DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Update

Thanks! I did some experiments around writetime and since I had to write a value anyway, I just wrote the time.

Like so:

CREATE TABLE simple_user_last_activity (
    user_id bigint,
    touched_at timestamp,
    PRIMARY KEY (user_id)
);

Then:

INSERT INTO simple_user_last_activity (user_id, touched_at) VALUES (6, dateof(now()));
SELECT touched_at from simple_user_last_activity WHERE user_id = 6;

Since touched_at is no longer in the primary key, only one record per user is stored.

Update 2

There's another option that I am going to go with. I can store the job_id too, which gives more data for analytics:

CREATE TABLE final_user_last_job_activities (
    user_id bigint,
    touched_at timestamp,
    job_id bigint,
    PRIMARY KEY (user_id, touched_at)
) 
WITH CLUSTERING ORDER BY (touched_at DESC)
AND default_time_to_live = 604800;

Adding the 1-week TTL takes care of expiring records - if there are none I return current time.

INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 5);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 7);
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);

SELECT * FROM final_user_last_job_activities LIMIT 1;

Which gives me:

 user_id | touched_at               | job_id
---------+--------------------------+--------
       5 | 2015-06-17 12:43:30+1200 |      6

Simple benchmarks show no significant performance difference in storing or reading from the bigger table.

解决方案

Because c* is last write wins, you can simply keep the latest versions of each row.

You could, as MSD suggests, use writetime to pull the time of the write. But be careful because this is column specific and you can't use write time on your primary key columns. For example in a table as follows:

cqlsh> create TABLE test.test ( a int, b int, c int, d int, primary key (a))
   ... ;
cqlsh> insert INTO  test.test (a, b, c, d) VALUES ( 1,2,3,4)
   ... ;

cqlsh> select * from test.test
   ... ;

 a | b    | c | d
---+------+---+------
 1 |    2 | 3 |    4

(2 rows)

cqlsh> insert into test.test (a,c) values (1, 6);
cqlsh> select * from test.test ;

 a | b    | c | d
---+------+---+------
 1 |    2 | 6 |    4

(2 rows)
cqlsh> select writetime(a), writetime(b), writetime(c), writetime(d) from test.test
   ... ;
InvalidRequest: code=2200 [Invalid query] message="Cannot use selection function writeTime on PRIMARY KEY part a"

cqlsh> select  writetime(b), writetime(c), writetime(d) from test.test  ;

 writetime(b)     | writetime(c)     | writetime(d)
------------------+------------------+------------------
 1434424690700887 | 1434424690700887 | 1434424702420929

Otherwise you can add a cql column with the timestamp:

create TABLE test.test ( a int, b int, c int, d int, touched_at timeuuid, primary key (a)) ;

Some quick benchmarking would help you determine which is more performant.

这篇关于在Cassandra中存储最后触摸时间的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆