如何只检索从Cassandra更改的信息? [英] How to retrieve only the information that got changed from Cassandra?
问题描述
我正在设计Cassandra列族模式为我的下面的用例。我不知道什么是最好的方式设计cassandra列家庭为我的下面的用例?我将为此使用CQL Datastax Java驱动程序。
I am working on designing the Cassandra Column Family schema for my below use case.. I am not sure what is the best way to design the cassandra column family for my below use case? I will be using CQL Datastax Java driver for this..
下面是我的用例和我现在设计的示例模式 -
Below is my use case and the sample schema that I have designed for now -
SCHEMA_ID RECORD_NAME SCHEMA_VALUE TIMESTAMP
1 ABC some value t1
2 ABC some_other_value t2
3 DEF some value again t3
4 DEF some other value t4
5 GHI some new value t5
6 IOP some values again t6
现在我从上表看到的是这样的东西 -
Now what I will be looking from the above table is something like this -
- 应用程序正在运行,我将从上面的表中请求一切。含义从上面的表中给我一切。
- 然后每5或10分钟,我的后台线程将检查这个表和将要求给我一切改变只有(如果任何东西被改变的那一行的所有行)..所以这是我使用时间戳作为这里的一列的原因。
但我不知道如何设计查询模式,这样我的两个用例都很容易满足,什么是正确的方式设计表为什么?这里SCHEMA_ID将是主键我想使用...
But I am not sure how to design the query pattern in such a way such that both of my use cases gets satisfied easily and what will be the proper way of designing the table for this? Here SCHEMA_ID will be primary key I am thinking to use...
我将使用CQL和Datastax Java驱动程序。
I will be using CQL and Datastax Java driver for this..
更新: -
如果我使用类似的方法,那么这种方法有什么问题吗?
If I am using something like this, then is there any problem with this approach?
CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT, LAST_MODIFIED_DATE TIMESTAMP, PRIMARY KEY (ID));
INSERT INTO TEST (SCHEMA_ID, RECORD_NAME, SCHEMA_VALUE, LAST_MODIFIED_DATE) VALUES ('1', 't26', 'SOME_VALUE', 1382655211694);
因为在我的这个用例中,我不想让任何人插入同一个 SCHEMA_ID
每次.. SCHEMA_ID
应该是唯一的,当我们插入任何新行到这个表..所以用你的示例(@omnibear) ,可能有可能,有人可以插入同样的SCHEMA_ID两次?我是否正确?
Because, in my this use case, I don't want anybody to insert same SCHEMA_ID
everytime.. SCHEMA_ID
should be unique whenever we are inserting any new row into this table.. So with your example (@omnibear), it might be possible, somebody can insert same SCHEMA_ID twice? Am I correct?
并且关于类型
你已经作为一个额外的列, code> record_name 。
And also regarding type
you have taken as an extra column, that type column can be record_name
in my example..
推荐答案
Cassandra用于重写,大量的数据在多个节点上。从这种设置检索所有数据是大胆的,因为这可能涉及大量,必须由一个客户端处理。更好的方法是使用分页。这是 2.0的本机支持。
Regarding 1) Cassandra is used for heavy writing, lots of data on multiple nodes. To retrieve ALL data from this kind of set-up is daring since this might involve huge amounts that have to be handled by one client. A better approach would be to use pagination. This is natively supported in 2.0.
关于2)
关键是分区键只支持EQ或IN查询。对于LT或GT(< />),使用列键。因此,如果通过一些ID(如type)对条目进行分组是有意义的,则可以将其用作分区键,将timeuuid用作列键。这允许查询所有比X更新的条目,如下所示
Regarding 2) The point is that partition keys only support EQ or IN queries. For LT or GT (< / >) you use column keys. So if it makes sense to group your entries by some ID like "type", you can use this for your partition key, and a timeuuid as a column key. This allows to query for all entries newer than X like so
create table test
(type int, SCHEMA_ID int, RECORD_NAME text,
SCHEMA_VALUE text, TIMESTAMP timeuuid,
primary key (type, timestamp));
select * from test where type IN (0,1,2,3) and timestamp < 58e0a7d7-eebc-11d8-9669-0800200c9a66;
更新:
您问:
有人可以插入同一个SCHEMA_ID两次吗?我是否正确?
somebody can insert same SCHEMA_ID twice? Am I correct?
是的,您可以随时使用现有主键进行插入。该主键上的值将被更新。因此,为了保持唯一性,在主键中经常使用UUID,例如timeuuid。它是包含时间戳和客户端的MAC地址的唯一值。有关于此主题的优秀文档。
Yes, you can always make an insert with an existing primary key. The values at that primary key will be updated. Therefore, to preserve uniqueness, a UUID is often used in the primary key, for instance, timeuuid. It is a unique value containing a timestamp and the MAC address of the client. There is excellent documentation on this topic.
一般建议:
- 先写下您的查询,然后设计您的模型。 (使用案例)
- 您的查询定义了您的数据模型,后者又主要由您的主键定义。 >
- Write down your queries first, then design your model. (Use case!)
- Your queries define your data model which in turn is primarily defined by your primary keys.
所以,在你的情况下,我只是调整上面的模式,如下:
So, in your case, I'd just adapt my schema above, like so:
CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT,
LAST_MODIFIED_DATE TIMEUUID, PRIMARY KEY (RECORD_NAME, LAST_MODIFIED_DATE));
这允许此查询:
select * from test where RECORD_NAME IN ("componentA","componentB")
and LAST_MODIFIED_DATE < 1688f180-4141-11e3-aa6e-0800200c9a66;
the uuid corresponds to -> Wednesday, October 30, 2013 8:55:55 AM GMT
so you would fetch everything after that
这篇关于如何只检索从Cassandra更改的信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!