Cassandra错误-无法限制“聚类”列(之前的列受非EQ关系限制) [英] Cassandra Error - Clustering column cannot be restricted (preceding column is restricted by a non-EQ relation)

查看:282
本文介绍了Cassandra错误-无法限制“聚类”列(之前的列受非EQ关系限制)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用Cassandra作为我们的车队管理解决方案的数据历史学家。我们在Cassandra中有一个表格,其中存储了车辆行驶的详细信息。表格结构如下所示:

 创建表的旅程详细信息(
bucketid文本,
vehicleid文本,
开始时间时间戳,
停止时间时间戳,
travelduration bigint,
主键(bucketid,vehicleid,starttime,travelduration)
);

其中:


  1. bucketid:-分区键,是月份和年份的组合

  2. vehicleid:-车辆的唯一ID

  3. 开始时间:-开始旅途时间

  4. 结束时间:-旅途结束时间

  5. 旅行时间:-旅途持续时间(以毫秒为单位)

我们想运行以下查询-获取车辆的所有行驶-2015-12-1至2015-12-3之间的行驶时间为1234567大于30分钟



我运行此查询时:

 从旅途详细信息中选择*,其中bucketid in('2015-12')和vehicleid in('1234567')
和开始时间> ‘2015-12-1 00:00:00’,开始时间< ‘2015-12-3 23:59:59’
和travelduration> 1800000;

我得到以下结果:

  InvalidRequest:代码= 2200 [无效查询] message =群集列 travelduration 
不能受到限制(之前的列 starttime受非EQ关系限制)

有人对如何解决此问题有建议吗?

解决方案

 从旅途详细信息中选择*,其中bucketid in('2015-12')和vehicleid in('1234567')
和开始时间>'2015-12-1 00:00:00'和开始时间<'2015-12-3 23:59:59'
和travelduration> 1800000;

这是行不通的,其原因可以追溯到Cassandra如何在磁盘上存储数据。Cassandra的想法是返回数据非常有效



您的行按 bucketid ,然后按在磁盘上排序vehicleid 开始时间旅行时间。因为您已经在 starttime 上执行范围查询(非EQ关系),所以您不能限制后面的键。这是因为 travelduration 限制可能会使范围条件中的某些行失去资格。这将导致效率低下,不连续的读取。 Cassandra旨在保护您免于编写可能具有无法预测的性能的查询(如此类)。



这里有两种选择:



1-如果您可以在 travelduration 之前限制所有关键列(具有等价关系),则可以应用大于条件:

 从旅途详细信息中选择*,其中bucketid ='2015-12'和vehicleid ='1234567'
和starttime = '2015-12-1 00:00:00'和travelduration> 1800000;

当然,要限制确切的开始时间

2-另一种方法是完全省略 travelduration ,然后输入原始查询

 从旅途详细信息中选择*,其中bucketid ='2015-12'和vehicleid ='1234567'
和开始时间‘2015-12-1 00:00:00’,开始时间< ‘2015-12-3 23:59:59’;

不幸的是,Cassandra没有提供很大的查询灵活性。许多人已经使用诸如 Spark (与Cassandra一起)的解决方案来达到这种报告水平。



还有一个注释,但除非需要,否则不要使用 IN 。使用 IN 进行查询与使用二级索引相似,因为Cassandra必须与多个节点通信才能满足您的查询。用单个项目调用它可能没什么大不了的。但是 IN 是那些古老的RDBMS习惯之一,您应该在深入研究Cassandra之前就先打破它们。


We are using Cassandra as the data historian for our fleet management solution. We have a table in Cassandra , which stores the details of journey made by the vehicle. The table structure is as given below

CREATE TABLE  journeydetails(
bucketid text,
vehicleid text,
starttime timestamp,
stoptime timestamp,
travelduration bigint,
PRIMARY KEY (bucketid,vehicleid,starttime,travelduration)
);

Where:

  1. bucketid :- partition key which is a combination of month and year
  2. vehicleid : -unique id of the vehicle
  3. starttime :- start time of the journey
  4. endtime :- endtime of the journey
  5. travelduration:- duration of travel in milliseconds

We would like to run the following query - get all the travels of a vehicle - 1234567 between 2015-12-1 and 2015-12-3 whose travel duration is greater than 30 minutes

When I run this query:

select * from  journeydetails where bucketid in('2015-12') and vehicleid in('1234567') 
  and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59' 
  and travelduration > 1800000;

I get this result:

InvalidRequest: code=2200 [Invalid query] message="Clustering column "travelduration"
cannot be restricted (preceding column "starttime" is restricted by a non-EQ relation)

Does anyone have a recommendation on how to fix this issue?

解决方案

select * from journeydetails where bucketid in('2015-12') and vehicleid in('1234567')
  and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59' 
  and travelduration > 1800000;

That's not going to work. The reason goes back to how Cassandra stores data on-disk. The idea with Cassandra is that it is very efficient at returning a single row with a precise key, or at returning a continuous range of rows from the disk.

Your rows are partitioned by bucketid, and then sorted on disk by vehicleid, starttime, and travelduration. Because you are already executing a range query (non-EQ relation) on starttime, you cannot restrict the key that follows. This is because the travelduration restriction may disqualify some of the rows in your range condition. This would result in an inefficient, non-continuous read. Cassandra is designed to protect you from writing queries (such as this), which may have unpredictable performance.

Here are two alternatives:

1- If you could restrict all of your key columns prior to travelduration (with an equals relation), then you could apply a your greater-than condition:

select * from journeydetails where bucketid='2015-12' and vehicleid='1234567'
  and starttime='2015-12-1 00:00:00' and travelduration > 1800000;

Of course, restricting on an exact starttime may not be terribly useful.

2- Another approach would be to omit travelduration altogether, and then your original query would work.

select * from journeydetails where bucketid='2015-12' and vehicleid='1234567'
  and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59';

Unfortunately, Cassandra does not offer a large degree of query flexibility. Many people have found success using a solution like Spark (alongside Cassandra) to achieve this level of reporting.

And just a side note, but don't use IN unless you have to. Querying with IN is similar to using a secondary index, in that Cassandra has to talk to several nodes to satisfy your query. Calling it with a single item probably isn't too big of a deal. But IN is one of those old RDBMS habits that you should really break before getting too deep into Cassandra.

这篇关于Cassandra错误-无法限制“聚类”列(之前的列受非EQ关系限制)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆