时间序列数据,使用 cassandra 中的 maxTimeuuid/minTimeuuid 选择范围 [英] time series data, selecting range with maxTimeuuid/minTimeuuid in cassandra

查看:26
本文介绍了时间序列数据,使用 cassandra 中的 maxTimeuuid/minTimeuuid 选择范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在 cassandra 中创建了一个键空间和一个列族.我有以下

I recently created a keyspace and a column family in cassandra. I have the following

CREATE TABLE reports (
  id timeuuid PRIMARY KEY,
  report varchar
)

我想根据时间范围选择报告.所以我的查询如下;

I want to select the report according to a range of time. so my query is the following;

select dateOf(id), id 
from keyspace.reports 
where token(id) > token(maxTimeuuid('2013-07-16 16:10:48+0300'));

它回来了;

dateOf(id)                | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b3f6d00-ee19-11e2-8734-8d331d938752
 2013-07-16 16:10:13+0300 | 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

所以,这是错误的.

当我尝试使用以下 cql 时;

When I try to use the following cql;

select dateOf(id), id from keyspace.reports 
where token(id) > token(minTimeuuid('2013-07-16 16:12:48+0300'));

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b3f6d00-ee19-11e2-8734-8d331d938752
 2013-07-16 16:10:13+0300 | 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

select dateOf(id), id from keyspace.reports
where token(id) > token(minTimeuuid('2013-07-16 16:13:48+0300'));

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

是随机的吗?为什么它不提供有意义的输出?

Is it random ? Why isn't it giving meaningful outputs ?

在 cassandra 中对此的最佳解决方案是什么?

What's the best solution for this in cassandra ?

推荐答案

您正在使用令牌函数,它在您的上下文中并没有真正有用(使用 mintimeuuid 和 maxtimeuuid 在时间之间进行查询)并且正在生成随机外观,并且错误输出:

You are using the token function, which isn't really useful in your context (querying between times using mintimeuuid and maxtimeuuid) and is generating random-looking, and incorrect output:

来自 CQL 文档:

TOKEN 函数可以与分区键列上的条件运算符一起使用以进行查询.查询根据行的分区键的标记而不是它们的值来选择行.键的令牌取决于使用的分区器.RandomPartitioner 和 Murmur3Partitioner 不会产生有意义的顺序.

The TOKEN function can be used with a condition operator on the partition key column to query. The query selects rows based on the token of their partition key rather than on their value. The token of a key depends on the partitioner in use. The RandomPartitioner and Murmur3Partitioner do not yield a meaningful order.

如果您希望根据两个日期之间的所有记录进行检索,将您的数据建模为宽行可能更有意义,每列一条记录,而不是每行一条记录,例如,创建表格:

If you are looking to retrieve based on all records between two dates it might make more sense to model your data as a wide row, with one record per column, rather than one record per row, e.g., creating the table:

CREATE TABLE reports (
  reportname text,
  id timeuuid,
  report text,
  PRIMARY KEY (reportname, id)
)

,填充数据:

insert into reports2(reportname,id,report) VALUES ('report', 1b3f6d00-ee19-11e2-8734-8d331d938752, 'a');
insert into reports2(reportname,id,report) VALUES ('report', 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b, 'b');
insert into reports2(reportname,id,report) VALUES ('report', 1b275870-ee19-11e2-b3f3-af3e3057c60f, 'c');
insert into reports2(reportname,id,report) VALUES ('report', 21f9a390-ee19-11e2-89a2-97143e6cae9e, 'd');

和查询(没有令牌调用!):

, and querying (no token calls!):

select dateOf(id),id from reports2 where reportname='report' and id>maxtimeuuid('2013-07-16 16:10:48+0300');

,返回预期结果:

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 14:10:48+0100 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

这样做的缺点是您的所有报告都在一行中,当然您现在可以存储许多不同的报告(此处按报告名称键入).要在 2013 年 8 月获得所有名为 mynewreport 的报告,您可以使用:

The downside to this is that all of your reports are in the one row, of course you can now store lots of different reports (keyed by reportname here). To get all reports called mynewreport in August 2013 you could query using:

select dateOf(id),id from reports2 where reportname='mynewreport' and id>=mintimeuuid('2013-08-01+0300') and id<mintimeuuid('2013-09-01+0300');

这篇关于时间序列数据,使用 cassandra 中的 maxTimeuuid/minTimeuuid 选择范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆