时间序列数据,在cassandra中使用maxTimeuuid / minTimeuuid选择范围 [英] time series data, selecting range with maxTimeuuid/minTimeuuid in cassandra
问题描述
我最近在cassandra中创建了一个键空间和一个列族。我有以下
I recently created a keyspace and a column family in cassandra. I have the following
CREATE TABLE reports (
id timeuuid PRIMARY KEY,
report varchar
)
我想根据时间范围选择报告。所以我的查询是下面的;
I want to select the report according to a range of time. so my query is the following;
select dateOf(id), id
from keyspace.reports
where token(id) > token(maxTimeuuid('2013-07-16 16:10:48+0300'));
返回;
dateOf(id) | id
--------------------------+--------------------------------------
2013-07-16 16:10:37+0300 | 1b3f6d00-ee19-11e2-8734-8d331d938752
2013-07-16 16:10:13+0300 | 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b
2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e
因此,这是错误的。
当我尝试使用以下cql时:
When I try to use the following cql;
select dateOf(id), id from keyspace.reports
where token(id) > token(minTimeuuid('2013-07-16 16:12:48+0300'));
dateOf(id) | id
--------------------------+--------------------------------------
2013-07-16 16:10:37+0300 | 1b3f6d00-ee19-11e2-8734-8d331d938752
2013-07-16 16:10:13+0300 | 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b
2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e
select dateOf(id), id from keyspace.reports
where token(id) > token(minTimeuuid('2013-07-16 16:13:48+0300'));
dateOf(id) | id
--------------------------+--------------------------------------
2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e
它是随机的吗?为什么不提供有意义的产出?
Is it random ? Why isn't it giving meaningful outputs ?
在cassandra中最好的解决方案是什么?
What's the best solution for this in cassandra ?
推荐答案
正在使用令牌函数,这在上下文中并不真正有用(在使用mintimeuuid和maxtimeuuid的时间之间进行查询),并且生成随机和不正确的输出:
You are using the token function, which isn't really useful in your context (querying between times using mintimeuuid and maxtimeuuid) and is generating random-looking, and incorrect output:
从 CQL文档:
TOKEN函数可以与分区键列上的条件运算符一起使用来查询。查询基于其分区键的令牌而不是其值来选择行。密钥的令牌取决于所使用的分区器。 RandomPartitioner和Murmur3Partitioner不会产生有意义的顺序。
The TOKEN function can be used with a condition operator on the partition key column to query. The query selects rows based on the token of their partition key rather than on their value. The token of a key depends on the partitioner in use. The RandomPartitioner and Murmur3Partitioner do not yield a meaningful order.
如果您希望根据两个日期之间的所有记录进行检索,更有意义的是将数据建模为宽行,每列一个记录,而不是每行一个记录,例如创建表:
If you are looking to retrieve based on all records between two dates it might make more sense to model your data as a wide row, with one record per column, rather than one record per row, e.g., creating the table:
CREATE TABLE reports (
reportname text,
id timeuuid,
report text,
PRIMARY KEY (reportname, id)
)
,填充数据:
insert into reports2(reportname,id,report) VALUES ('report', 1b3f6d00-ee19-11e2-8734-8d331d938752, 'a');
insert into reports2(reportname,id,report) VALUES ('report', 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b, 'b');
insert into reports2(reportname,id,report) VALUES ('report', 1b275870-ee19-11e2-b3f3-af3e3057c60f, 'c');
insert into reports2(reportname,id,report) VALUES ('report', 21f9a390-ee19-11e2-89a2-97143e6cae9e, 'd');
,并且查询(无令牌调用!):
, and querying (no token calls!):
select dateOf(id),id from reports2 where reportname='report' and id>maxtimeuuid('2013-07-16 16:10:48+0300');
,返回预期结果:
dateOf(id) | id
--------------------------+--------------------------------------
2013-07-16 14:10:48+0100 | 21f9a390-ee19-11e2-89a2-97143e6cae9e
这样做的缺点是所有的报告都在一行,当然你现在可以存储大量不同的报告(在这里用reportname关键)。要在2013年8月获取名为 mynewreport
的所有报告,您可以使用以下命令查询:
The downside to this is that all of your reports are in the one row, of course you can now store lots of different reports (keyed by reportname here). To get all reports called mynewreport
in August 2013 you could query using:
select dateOf(id),id from reports2 where reportname='mynewreport' and id>=mintimeuuid('2013-08-01+0300') and id<mintimeuuid('2013-09-01+0300');
这篇关于时间序列数据,在cassandra中使用maxTimeuuid / minTimeuuid选择范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!