如何仅检索从 Cassandra 更改的信息? [英] How to retrieve only the information that got changed from Cassandra?

查看:17
本文介绍了如何仅检索从 Cassandra 更改的信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为我的以下用例设计 Cassandra 列族架构.我不确定为我的以下用例设计 cassandra 列族的最佳方法是什么?我将为此使用 CQL Datastax Java 驱动程序..

I am working on designing the Cassandra Column Family schema for my below use case.. I am not sure what is the best way to design the cassandra column family for my below use case? I will be using CQL Datastax Java driver for this..

以下是我的用例和我现在设计的示例架构 -

Below is my use case and the sample schema that I have designed for now -

SCHEMA_ID       RECORD_NAME               SCHEMA_VALUE              TIMESTAMP
1                  ABC                     some value                 t1
2                  ABC                     some_other_value           t2
3                  DEF                     some value again           t3
4                  DEF                     some other value           t4
5                  GHI                     some new value             t5
6                  IOP                     some values again          t6

现在我将从上表中看到的是这样的 -

Now what I will be looking from the above table is something like this -

  1. 第一次每当我的应用程序运行时,我都会要求上表中的所有内容..意思是给我上表中的所有内容..
  2. 然后每 5 或 10 分钟,我的后台线程将检查此表,并要求给我仅更改的所有内容(如果该行有任何更改,则为整行).. 所以这就是我使用的原因时间戳作为此处的列之一..

但我不确定如何设计查询模式,以便我的两个用例都能轻松满足,为此设计表的正确方法是什么?这里 SCHEMA_ID 将是我想使用的主键...

But I am not sure how to design the query pattern in such a way such that both of my use cases gets satisfied easily and what will be the proper way of designing the table for this? Here SCHEMA_ID will be primary key I am thinking to use...

我将为此使用 CQL 和 Datastax Java 驱动程序..

I will be using CQL and Datastax Java driver for this..

更新:-

如果我正在使用这样的东西,那么这种方法有什么问题吗?

If I am using something like this, then is there any problem with this approach?

CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT, LAST_MODIFIED_DATE TIMESTAMP, PRIMARY KEY (ID));

INSERT INTO TEST (SCHEMA_ID, RECORD_NAME, SCHEMA_VALUE, LAST_MODIFIED_DATE) VALUES ('1', 't26',  'SOME_VALUE', 1382655211694);

因为,在我的这个用例中,我不希望任何人每次都插入相同的 SCHEMA_ID.. SCHEMA_ID 每当我们插入任何新行时都应该是唯一的这个表..所以用你的例子(@omnibear),可能有人可以插入相同的 SCHEMA_ID 两次?我说得对吗?

Because, in my this use case, I don't want anybody to insert same SCHEMA_ID everytime.. SCHEMA_ID should be unique whenever we are inserting any new row into this table.. So with your example (@omnibear), it might be possible, somebody can insert same SCHEMA_ID twice? Am I correct?

此外,关于 type 您已将其作为额外的列,在我的示例中,该类型列可以是 record_name ..

And also regarding type you have taken as an extra column, that type column can be record_name in my example..

推荐答案

关于 1)Cassandra 用于大量写入,多个节点上的大量数据.从这种设置中检索所有数据是大胆的,因为这可能涉及必须由一个客户端处理的大量数据.更好的方法是使用分页.这是在 2.0 中原生支持.

Regarding 1) Cassandra is used for heavy writing, lots of data on multiple nodes. To retrieve ALL data from this kind of set-up is daring since this might involve huge amounts that have to be handled by one client. A better approach would be to use pagination. This is natively supported in 2.0.

关于 2)关键是分区键只支持 EQ 或 IN 查询.对于 LT 或 GT (</>),您使用列键.因此,如果按类型"等 ID 对您的条目进行分组是有意义的,您可以将其用于分区键,并将 timeuuid 用作列键.这允许像这样查询所有比 X 新的条目

Regarding 2) The point is that partition keys only support EQ or IN queries. For LT or GT (< / >) you use column keys. So if it makes sense to group your entries by some ID like "type", you can use this for your partition key, and a timeuuid as a column key. This allows to query for all entries newer than X like so

create table test 
  (type int, SCHEMA_ID int, RECORD_NAME text, 
  SCHEMA_VALUE text, TIMESTAMP timeuuid, 
  primary key (type, timestamp));

select * from test where type IN (0,1,2,3) and timestamp < 58e0a7d7-eebc-11d8-9669-0800200c9a66;

更新:

你问:

有人可以两次插入相同的 SCHEMA_ID 吗?我说得对吗?

somebody can insert same SCHEMA_ID twice? Am I correct?

是的,您始终可以使用现有主键进行插入.该主键的值将被更新.因此,为了保持唯一性,主键中经常使用UUID,例如timeuuid.它是一个包含时间戳和客户端 MAC 地址的唯一值.有 关于这个主题的优秀文档.

Yes, you can always make an insert with an existing primary key. The values at that primary key will be updated. Therefore, to preserve uniqueness, a UUID is often used in the primary key, for instance, timeuuid. It is a unique value containing a timestamp and the MAC address of the client. There is excellent documentation on this topic.

一般建议:

  1. 首先写下您的查询,然后设计您的模型.(用例!)
  2. 您的查询定义了您的数据模型,而数据模型又主要由您的主键定义.

因此,在您的情况下,我只会调整上面的架构,如下所示:

So, in your case, I'd just adapt my schema above, like so:

CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT,   
LAST_MODIFIED_DATE TIMEUUID, PRIMARY KEY (RECORD_NAME, LAST_MODIFIED_DATE));

允许这个查询:

select * from test where RECORD_NAME IN ("componentA","componentB")
  and LAST_MODIFIED_DATE < 1688f180-4141-11e3-aa6e-0800200c9a66;

the uuid corresponds to -> Wednesday, October 30, 2013 8:55:55 AM GMT
so you would fetch everything after that

这篇关于如何仅检索从 Cassandra 更改的信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆