如何只检索从Cassandra更改的信息？ [英] How to retrieve only the information that got changed from Cassandra?

查看：226 发布时间：2016/11/13 14:17:21 java cassandra cql datastax-java-driver

本文介绍了如何只检索从Cassandra更改的信息？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在设计Cassandra列族模式为我的下面的用例。我不知道什么是最好的方式设计cassandra列家庭为我的下面的用例？我将为此使用CQL Datastax Java驱动程序。

I am working on designing the Cassandra Column Family schema for my below use case.. I am not sure what is the best way to design the cassandra column family for my below use case? I will be using CQL Datastax Java driver for this..

下面是我的用例和我现在设计的示例模式 -

Below is my use case and the sample schema that I have designed for now -

SCHEMA_ID       RECORD_NAME               SCHEMA_VALUE              TIMESTAMP
1                  ABC                     some value                 t1
2                  ABC                     some_other_value           t2
3                  DEF                     some value again           t3
4                  DEF                     some other value           t4
5                  GHI                     some new value             t5
6                  IOP                     some values again          t6

现在我从上表看到的是这样的东西 -

Now what I will be looking from the above table is something like this -

应用程序正在运行，我将从上面的表中请求一切。含义从上面的表中给我一切。

然后每5或10分钟，我的后台线程将检查这个表和将要求给我一切改变只有（如果任何东西被改变的那一行的所有行）..所以这是我使用时间戳作为这里的一列的原因。

但我不知道如何设计查询模式，这样我的两个用例都很容易满足，什么是正确的方式设计表为什么？这里SCHEMA_ID将是主键我想使用...

But I am not sure how to design the query pattern in such a way such that both of my use cases gets satisfied easily and what will be the proper way of designing the table for this? Here SCHEMA_ID will be primary key I am thinking to use...

我将使用CQL和Datastax Java驱动程序。

I will be using CQL and Datastax Java driver for this..

更新： -

如果我使用类似的方法，那么这种方法有什么问题吗？

If I am using something like this, then is there any problem with this approach?

CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT, LAST_MODIFIED_DATE TIMESTAMP, PRIMARY KEY (ID));

INSERT INTO TEST (SCHEMA_ID, RECORD_NAME, SCHEMA_VALUE, LAST_MODIFIED_DATE) VALUES ('1', 't26',  'SOME_VALUE', 1382655211694);

因为在我的这个用例中，我不想让任何人插入同一个 SCHEMA_ID 每次.. SCHEMA_ID 应该是唯一的，当我们插入任何新行到这个表..所以用你的示例（@omnibear），可能有可能，有人可以插入同样的SCHEMA_ID两次？我是否正确？

Because, in my this use case, I don't want anybody to insert same SCHEMA_ID everytime.. SCHEMA_ID should be unique whenever we are inserting any new row into this table.. So with your example (@omnibear), it might be possible, somebody can insert same SCHEMA_ID twice? Am I correct?

并且关于类型你已经作为一个额外的列， code> record_name 。

And also regarding type you have taken as an extra column, that type column can be record_name in my example..

推荐答案

Cassandra用于重写，大量的数据在多个节点上。从这种设置检索所有数据是大胆的，因为这可能涉及大量，必须由一个客户端处理。更好的方法是使用分页。这是 2.0的本机支持。

Regarding 1) Cassandra is used for heavy writing, lots of data on multiple nodes. To retrieve ALL data from this kind of set-up is daring since this might involve huge amounts that have to be handled by one client. A better approach would be to use pagination. This is natively supported in 2.0.

关于2）
关键是分区键只支持EQ或IN查询。对于LT或GT（< />），使用列键。因此，如果通过一些ID（如type）对条目进行分组是有意义的，则可以将其用作分区键，将timeuuid用作列键。这允许查询所有比X更新的条目，如下所示

Regarding 2) The point is that partition keys only support EQ or IN queries. For LT or GT (< / >) you use column keys. So if it makes sense to group your entries by some ID like "type", you can use this for your partition key, and a timeuuid as a column key. This allows to query for all entries newer than X like so

create table test 
  (type int, SCHEMA_ID int, RECORD_NAME text, 
  SCHEMA_VALUE text, TIMESTAMP timeuuid, 
  primary key (type, timestamp));

select * from test where type IN (0,1,2,3) and timestamp < 58e0a7d7-eebc-11d8-9669-0800200c9a66;

更新：

您问：

有人可以插入同一个SCHEMA_ID两次吗？我是否正确？

somebody can insert same SCHEMA_ID twice? Am I correct?

是的，您可以随时使用现有主键进行插入。该主键上的值将被更新。因此，为了保持唯一性，在主键中经常使用UUID，例如timeuuid。它是包含时间戳和客户端的MAC地址的唯一值。有关于此主题的优秀文档。

Yes, you can always make an insert with an existing primary key. The values at that primary key will be updated. Therefore, to preserve uniqueness, a UUID is often used in the primary key, for instance, timeuuid. It is a unique value containing a timestamp and the MAC address of the client. There is excellent documentation on this topic.

一般建议：

先写下您的查询，然后设计您的模型。 （使用案例）

您的查询定义了您的数据模型，后者又主要由您的主键定义。 >

Write down your queries first, then design your model. (Use case!)
Your queries define your data model which in turn is primarily defined by your primary keys.

所以，在你的情况下，我只是调整上面的模式，如下：

So, in your case, I'd just adapt my schema above, like so:

CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT,   
LAST_MODIFIED_DATE TIMEUUID, PRIMARY KEY (RECORD_NAME, LAST_MODIFIED_DATE));

这允许此查询：

select * from test where RECORD_NAME IN ("componentA","componentB")
  and LAST_MODIFIED_DATE < 1688f180-4141-11e3-aa6e-0800200c9a66;

the uuid corresponds to -> Wednesday, October 30, 2013 8:55:55 AM GMT
so you would fetch everything after that

这篇关于如何只检索从Cassandra更改的信息？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何只检索从Cassandra更改的信息？ [英] How to retrieve only the information that got changed from Cassandra?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何只检索从Cassandra更改的信息？ [英] How to retrieve only the information that got changed from Cassandra?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭