时间序列数据,在cassandra中使用maxTimeuuid / minTimeuuid选择范围 [英] time series data, selecting range with maxTimeuuid/minTimeuuid in cassandra

查看:594
本文介绍了时间序列数据,在cassandra中使用maxTimeuuid / minTimeuuid选择范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在cassandra中创建了一个键空间和一个列族。我有以下

I recently created a keyspace and a column family in cassandra. I have the following

CREATE TABLE reports (
  id timeuuid PRIMARY KEY,
  report varchar
)

我想根据时间范围选择报告。所以我的查询是下面的;

I want to select the report according to a range of time. so my query is the following;

select dateOf(id), id 
from keyspace.reports 
where token(id) > token(maxTimeuuid('2013-07-16 16:10:48+0300'));

返回;

dateOf(id)                | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b3f6d00-ee19-11e2-8734-8d331d938752
 2013-07-16 16:10:13+0300 | 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

因此,这是错误的。

当我尝试使用以下cql时:

When I try to use the following cql;

select dateOf(id), id from keyspace.reports 
where token(id) > token(minTimeuuid('2013-07-16 16:12:48+0300'));

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b3f6d00-ee19-11e2-8734-8d331d938752
 2013-07-16 16:10:13+0300 | 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

select dateOf(id), id from keyspace.reports
where token(id) > token(minTimeuuid('2013-07-16 16:13:48+0300'));

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

它是随机的吗?为什么不提供有意义的产出?

Is it random ? Why isn't it giving meaningful outputs ?

在cassandra中最好的解决方案是什么?

What's the best solution for this in cassandra ?

推荐答案

正在使用令牌函数,这在上下文中并不真正有用(在使用mintimeuuid和maxtimeuuid的时间之间进行查询),并且生成随机和不正确的输出:

You are using the token function, which isn't really useful in your context (querying between times using mintimeuuid and maxtimeuuid) and is generating random-looking, and incorrect output:

CQL文档


TOKEN函数可以与分区键列上的条件运算符一起使用来查询。查询基于其分区键的令牌而不是其值来选择行。密钥的令牌取决于所使用的分区器。 RandomPartitioner和Murmur3Partitioner不会产生有意义的顺序。

The TOKEN function can be used with a condition operator on the partition key column to query. The query selects rows based on the token of their partition key rather than on their value. The token of a key depends on the partitioner in use. The RandomPartitioner and Murmur3Partitioner do not yield a meaningful order.

如果您希望根据两个日期之间的所有记录进行检索,更有意义的是将数据建模为宽行,每列一个记录,而不是每行一个记录,例如创建表:

If you are looking to retrieve based on all records between two dates it might make more sense to model your data as a wide row, with one record per column, rather than one record per row, e.g., creating the table:

CREATE TABLE reports (
  reportname text,
  id timeuuid,
  report text,
  PRIMARY KEY (reportname, id)
)

,填充数据:

insert into reports2(reportname,id,report) VALUES ('report', 1b3f6d00-ee19-11e2-8734-8d331d938752, 'a');
insert into reports2(reportname,id,report) VALUES ('report', 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b, 'b');
insert into reports2(reportname,id,report) VALUES ('report', 1b275870-ee19-11e2-b3f3-af3e3057c60f, 'c');
insert into reports2(reportname,id,report) VALUES ('report', 21f9a390-ee19-11e2-89a2-97143e6cae9e, 'd');

,并且查询(无令牌调用!):

, and querying (no token calls!):

select dateOf(id),id from reports2 where reportname='report' and id>maxtimeuuid('2013-07-16 16:10:48+0300');

,返回预期结果:

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 14:10:48+0100 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

这样做的缺点是所有的报告都在一行,当然你现在可以存储大量不同的报告(在这里用reportname关键)。要在2013年8月获取名为 mynewreport 的所有报告,您可以使用以下命令查询:

The downside to this is that all of your reports are in the one row, of course you can now store lots of different reports (keyed by reportname here). To get all reports called mynewreport in August 2013 you could query using:

select dateOf(id),id from reports2 where reportname='mynewreport' and id>=mintimeuuid('2013-08-01+0300') and id<mintimeuuid('2013-09-01+0300');

这篇关于时间序列数据,在cassandra中使用maxTimeuuid / minTimeuuid选择范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆