cassandra获取时间范围内的所有记录 [英] cassandra get all records in time range

查看:1215
本文介绍了cassandra获取时间范围内的所有记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须使用(user_id,timestamp)作为键的列族。在我的查询中,我想获取在给定的时间范围内的所有记录,独立于user_id。这是确切的表模式:

I have to work with a column family that has (user_id, timestamp) as key. In my query I would like to fetch all records in a given time range independent of the user_id. This is the exact table schema:

CREATE TABLE userlog (
  user_id text,
  ts timestamp,
  action text,
  app_type text,
  channel_name text,
  channel_session_id text,
  pid text,
  region_id text,
  PRIMARY KEY (user_id, ts)
)

我试图运行

SELECT * FROM userlog  WHERE ts >= '2013-01-01 00:00:00+0200' AND  ts <= '2013-08-13 23:59:00+0200' ALLOW FILTERING;

这在我的本地cassandra安装包含一个小数据集,但失败与

which works fine on my local cassandra installation containing a small data set but fails with

Request did not complete within rpc_timeout.

在包含所有数据的生产系统上。

on the productive system containing all the data.

有一个,最好是cql,查询与给定的列族顺利运行,或者我们必须更改设计?

Is there a, preferably cql, query that runs smoothly with the given column family or de we have to change the design?

推荐答案

超时是因为Cassandra花费的时间超过了超时(默认为10秒)以返回数据。对于您的查询,Cassandra将尝试在返回之前获取整个数据集。对于多个记录,这可能比超时时间更容易。

The timeout is because Cassandra is taking longer than the timeout (default is 10 seconds) to return the data. For your query, Cassandra will attempt to fetch the entire dataset before returning. For more than a few records this can easily take longer than the timeout.

对于生成大量数据的查询,您需要页面例如

For queries that are producing lots of data you need to page e.g.

SELECT * FROM userlog WHERE ts >= '2013-01-01 00:00:00+0200' AND  ts <= '2013-08-13 23:59:00+0200' AND token(user_id) > previous_token LIMIT 100 ALLOW FILTERING;

其中 user_id 是返回的上一个user_id。

where user_id is the previous user_id returned. You will also need to page on ts to guarantee you get all the records for the last user_id returned.

或者,在Cassandra 2.0.0(刚刚发布)中,分页是

Alternatively, in Cassandra 2.0.0 (just released), paging is done transparently so your original query should work with no timeout or manual paging.

ALLOW FILTERING 表示Cassandra正在阅读通过您的所有数据,但只返回指定范围内的数据。这只有在范围是大部分数据时才有效。如果您想查找例如5分钟的时间窗口,这将是非常低效的。

The ALLOW FILTERING means Cassandra is reading through all your data, but only returning data within the range specified. This is only efficient if the range is most of the data. If you wanted to find records within e.g. a 5 minute time window, this would be very inefficient.

这篇关于cassandra获取时间范围内的所有记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆