Cassandra 错误 - 无法限制聚类列(前一列受非 EQ 关系限制) [英] Cassandra Error - Clustering column cannot be restricted (preceding column is restricted by a non-EQ relation)

查看:18
本文介绍了Cassandra 错误 - 无法限制聚类列(前一列受非 EQ 关系限制)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用 Cassandra 作为我们车队管理解决方案的数据历史记录.我们在 Cassandra 中有一张表,用于存储车辆行驶的详细信息.表结构如下

CREATE TABLE traveldetails(bucketid 文本,车辆标识文本,开始时间时间戳,停止时间时间戳,travelduration bigint,PRIMARY KEY (bucketid,vehicleid,starttime,travelduration));

地点:

  1. bucketid :- 分区键是月份和年份的组合
  2. vehicleid : - 车辆的唯一 ID
  3. starttime :- 旅程的开始时间
  4. endtime :- 旅程的结束时间
  5. travelduration:- 以毫秒为单位的旅行持续时间

我们想运行以下查询 - 获取车辆在 2015-12-1 和 2015-12-3 之间的所有行程 - 1234567,其行程持续时间大于 30 分钟 >

当我运行这个查询时:

select * from traveldetails where bucketid in('2015-12') and Vehicleid in('1234567')和开始时间 >'2015-12-1 00:00:00' 和开始时间 <'2015-12-3 23:59:59'和旅行时间 >1800000;

我得到了这个结果:

InvalidRequest: code=2200 [Invalid query] message="聚类列travelduration"不能被限制(前一列开始时间"受非 EQ 关系限制)

有没有人有关于如何解决这个问题的建议?

解决方案

select * from traveldetails where bucketid in('2015-12') and Vehicleid in('1234567')和开始时间 >'2015-12-1 00:00:00' 和开始时间 <'2015-12-3 23:59:59'和旅行时间 >1800000;

那是行不通的.原因可以追溯到 Cassandra 如何在磁盘上存储数据.Cassandra 的想法是它在返回具有精确键的单行或从磁盘返回连续范围的行时非常有效.

您的行按 bucketid 分区,然后按 vehicleidstarttimetravelduration 在磁盘上排序.因为您已经在 starttime 上执行范围查询(非 EQ 关系),所以您不能限制后面的键.这是因为 travelduration 限制可能会取消范围条件中的某些行的资格.这将导致低效、非连续读取.Cassandra 旨在保护您免于编写查询(例如此类),这些查询可能具有不可预测的性能.

这里有两种选择:

1- 如果您可以在 travelduration 之前限制所有键列(具有等于关系),那么您可以应用大于条件:

select * from traveldetails where bucketid='2015-12' and vehicleid='1234567'和 starttime='2015-12-1 00:00:00' 和 travelduration >1800000;

当然,限制一个确切的开始时间可能不是很有用.

2- 另一种方法是完全省略 travelduration,然后您的原始查询将起作用.

select * from traveldetails where bucketid='2015-12' and vehicleid='1234567'和开始时间 >'2015-12-1 00:00:00' 和开始时间 <'2015-12-3 23:59:59';

不幸的是,Cassandra 没有提供很大程度的查询灵活性.许多人使用 Spark(与 Ca​​ssandra 一起)等解决方案取得了成功,以实现这种级别的报告.

只是一个旁注,但除非必须,否则不要使用 IN.使用 IN 进行查询类似于使用二级索引,因为 Cassandra 必须与多个节点交谈才能满足您的查询.用单个项目调用它可能没什么大不了的.但是 IN 是 RDBMS 的旧习惯之一,在深入了解 Cassandra 之前,您应该真正打破这些习惯.

We are using Cassandra as the data historian for our fleet management solution. We have a table in Cassandra , which stores the details of journey made by the vehicle. The table structure is as given below

CREATE TABLE  journeydetails(
bucketid text,
vehicleid text,
starttime timestamp,
stoptime timestamp,
travelduration bigint,
PRIMARY KEY (bucketid,vehicleid,starttime,travelduration)
);

Where:

  1. bucketid :- partition key which is a combination of month and year
  2. vehicleid : -unique id of the vehicle
  3. starttime :- start time of the journey
  4. endtime :- endtime of the journey
  5. travelduration:- duration of travel in milliseconds

We would like to run the following query - get all the travels of a vehicle - 1234567 between 2015-12-1 and 2015-12-3 whose travel duration is greater than 30 minutes

When I run this query:

select * from  journeydetails where bucketid in('2015-12') and vehicleid in('1234567') 
  and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59' 
  and travelduration > 1800000;

I get this result:

InvalidRequest: code=2200 [Invalid query] message="Clustering column "travelduration"
cannot be restricted (preceding column "starttime" is restricted by a non-EQ relation)

Does anyone have a recommendation on how to fix this issue?

解决方案

select * from journeydetails where bucketid in('2015-12') and vehicleid in('1234567')
  and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59' 
  and travelduration > 1800000;

That's not going to work. The reason goes back to how Cassandra stores data on-disk. The idea with Cassandra is that it is very efficient at returning a single row with a precise key, or at returning a continuous range of rows from the disk.

Your rows are partitioned by bucketid, and then sorted on disk by vehicleid, starttime, and travelduration. Because you are already executing a range query (non-EQ relation) on starttime, you cannot restrict the key that follows. This is because the travelduration restriction may disqualify some of the rows in your range condition. This would result in an inefficient, non-continuous read. Cassandra is designed to protect you from writing queries (such as this), which may have unpredictable performance.

Here are two alternatives:

1- If you could restrict all of your key columns prior to travelduration (with an equals relation), then you could apply a your greater-than condition:

select * from journeydetails where bucketid='2015-12' and vehicleid='1234567'
  and starttime='2015-12-1 00:00:00' and travelduration > 1800000;

Of course, restricting on an exact starttime may not be terribly useful.

2- Another approach would be to omit travelduration altogether, and then your original query would work.

select * from journeydetails where bucketid='2015-12' and vehicleid='1234567'
  and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59';

Unfortunately, Cassandra does not offer a large degree of query flexibility. Many people have found success using a solution like Spark (alongside Cassandra) to achieve this level of reporting.

And just a side note, but don't use IN unless you have to. Querying with IN is similar to using a secondary index, in that Cassandra has to talk to several nodes to satisfy your query. Calling it with a single item probably isn't too big of a deal. But IN is one of those old RDBMS habits that you should really break before getting too deep into Cassandra.

这篇关于Cassandra 错误 - 无法限制聚类列(前一列受非 EQ 关系限制)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆