Cassandra - 重叠数据范围 [英] Cassandra - Overlapping Data Ranges

查看：123 发布时间：2016/11/13 14:11:15 cassandra cql cassandra-2.0 cql3

本文介绍了Cassandra - 重叠数据范围的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Cassandra中有以下'任务'表。

I have the following 'Tasks' table in Cassandra.

Task_ID UUID - 分区键

Starts_On TIMESTAMP-群集列

Ends_On TIMESTAMP - 群集列

我想运行CQL查询以获取指定日期范围内的重叠任务。例如，如果我将两个时间戳（T1和T2）作为参数传递给查询，我想获得适用于该范围内的所有任务（即重叠记录）。

I want to run a CQL query to get the overlapping tasks for a given date range. For example, if I pass in two timestamps (T1 and T2) as parameters to the query, I want to get the all tasks that are applicable with in that range (that is, overlapping records).

在Cassandra中最好的方法是什么？我不能在这里使用Starts_On和Ends_On两个范围，因为要向Ends_On添加一个范围查询，我必须有一个相等的检查Starts_On。

What is the best way to do this in Cassandra? I cannot just use two ranges on Starts_On and Ends_On here because to add a range query to Ends_On, I have to have a equality check for Starts_On.

推荐答案

这里有另一个想法（有点不常见）。您可以创建一个用户定义的函数来实现第二个范围过滤器（在Cassandra 2.2和更新版本中）。

Here's another idea (somewhat unconventional). You could create a user defined function to implement the second range filter (in Cassandra 2.2 and newer).

假设你像这样定义你的表保持示例简单）：

Suppose you define your table like this (shown with ints instead of timestamps to keep the example simple):

CREATE TABLE tasks (
    p int, 
    task_id timeuuid, 
    start int, 
    end int, 
    end_range int static, 
    PRIMARY KEY(p, start));

现在我们创建一个用户定义的函数来检查基于结束时间返回的行，并返回task_id匹配行，像这样：

Now we create a user defined function to check returned rows based on the end time, and return the task_id of matching rows, like this:

CREATE FUNCTION my_end_range(task_id timeuuid, end int, end_range int) 
    CALLED ON NULL INPUT RETURNS timeuuid LANGUAGE java AS 
    'if (end <= end_range) return task_id; else return null;';

现在我在使用第三个参数。在一个明显的（主要的）监督中，似乎你不能传递一个常量到用户定义的函数。所以为了解决这个问题，我们传递一个静态列（end_range）作为我们的常量。

Now I'm using a trick there with the third parameter. In an apparent (major?) oversight, it appears you can't pass a constant to a user defined function. So to work around that, we pass a static column (end_range) as our constant.

所以首先我们要设置end_range：

So first we have to set the end_range we want:

UPDATE tasks SET end_range=15 where p=1;

假设我们有这些数据：

SELECT * FROM tasks;

 p | start | end_range | end | task_id
---+-------+-----------+-----+--------------------------------------
 1 |     1 |        15 |   5 | 2c6e9340-4a88-11e5-a180-433e07a8bafb
 1 |     2 |        15 |   7 | 3233a040-4a88-11e5-a180-433e07a8bafb
 1 |     4 |        15 |  22 | f98fd9b0-4a88-11e5-a180-433e07a8bafb
 1 |     8 |        15 |  15 | 37ec7840-4a88-11e5-a180-433e07a8bafb

现在让我们得到task_id的开始> = 2， end <= 15：

Now let's get the task_id's that have start >= 2 and end <= 15:

SELECT start, end, my_end_range(task_id, end, end_range) FROM tasks 
    WHERE p=1 AND start >= 2;

 start | end | test.my_end_range(task_id, end, end_range)
-------+-----+--------------------------------------------
     2 |   7 |       3233a040-4a88-11e5-a180-433e07a8bafb
     4 |  22 |                                       null
     8 |  15 |       37ec7840-4a88-11e5-a180-433e07a8bafb

这样就给你匹配的task_id，空行（我还没有想出一个方法来删除行使用UDF的）。你会注意到，开始> = 2的过滤器在传递给UDF之前已经删除了一行。

So that gives you the matching task_id's and you have to ignore the null rows (I haven't figured out a way to drop rows using UDF's). You'll note that the filter of start >= 2 dropped one row before passing it to the UDF.

无论如何不是一个完美的方法，可以工作。：）

Anyway not a perfect method obviously, but it might be something you can work with. :)

这篇关于Cassandra - 重叠数据范围的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Cassandra - 重叠数据范围 [英] Cassandra - Overlapping Data Ranges

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Cassandra - 重叠数据范围 [英] Cassandra - Overlapping Data Ranges

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭