需要查询到Cassandra中的列(或集合) [英] Need to querying into a column (or collection) in Cassandra

查看:147
本文介绍了需要查询到Cassandra中的列(或集合)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所有人:D

我正在使用Cassandra(Datastax版本),但遇到了问题.

I'm working with Cassandra (Datastax version) and I have an issue.

我想为一个(总是)会改变的列建模.

I want to modeling a column who (always) gonna change.

这很困难,因为我不能只创建具有1,2,3,4..10列的列族.因为,明天可能会改变.

That's very hard, because I can't just create a column family with 1,2,3,4..10 columns. Because, tomorrow probably can change.

我认为在收藏中,但是我必须查询这些.我的意思是,我需要每秒查询一次此信息.

I think in collections, but I got to query into these. I mean, I need query into this information every second.

例如:附有地图:

<'col1':'val1','col2':'val2'> 

我需要这样查询:

SELECT * FROM example WHERE 'col1' = 'val1' AND 'col2' = 'val2';

我不知道该怎么做,这对于我想做的事情非常必要.

I don't know how to do this and is extremely necessary for what I want to do.

即使我读到您也可以创建一列(文本)并实现一种格式:

Even, I read that you can create a column (text) and implement a kind of format:

colum1 = 'val1\x01val2\x01'

但这无法解决我想做的事情,因为我无法查询此字段(或不知道如何操作)

But this doesn't resolve what I want to do, because I cant query on this fields (or don't know how)

请,您能帮我建模类似的东西吗?

Please, can u help me to model something like that?

我不能使用收藏集,因为(根据我的阅读)很慢.

I can't use a collection because (according to what I read) is slowly.

PD:对不起,如果我的英语不好:(但谢谢

PD: sorry if my English is bad :( but thank you

推荐答案

您可以创建这样的表

CREATE TABLE dynamic_columns
   partitionKey bigint,
   column_name text,
   column_value_text text,
   column_value_boolean boolean,
   column_value_bigint bigint,
   column_value_uuid uuid,
   column_value_timestamp timestamp,
   ....
   PRIMARY KEY((partitionKey), column_name)
);

此处的partitionKey指示您的数据将存储在群集中的哪台计算机上

The partitionKey is here to indicate on which machine(s) your data will be stored in the cluster

聚类列column_name将存储动态列的标签.然后,我们有一个 normal 列的列表,每种数据类型(bigint,uuid,timestamp ....)一列.

The clustering column column_name will store the label of your dynamic column. Then we have a list of normal columns, one for each data type (bigint, uuid, timestamp ....)

让我们举个例子:

INSERT INTO dynamic_columns(partitionKey, column_name, column_value_text)
VALUES(1, 'firstname', 'John DOE');

INSERT INTO dynamic_columns(partitionKey, column_name, column_value_boolean)
VALUES(1, 'validity_state', true);

INSERT INTO dynamic_columns(partitionKey, column_name, column_value_timestamp)
VALUES(1, 'validity_date', '2016-03-13 12:00:00+0000');

因此,我们的想法是定义一个column_value列表,为Cassandra中的每个现有类型定义一个列表,但是我们仅将数据插入适当的type列,如上面的示例.

So the idea is that we define a list of column_value, one for each existing type in Cassandra but we only insert data into the appropriate type column, like the examples above.

对于查询,您需要在每个类型列上创建一个索引.示例:

For querying, you'll need to create an index on each type column. Example:

CREATE INDEX ON dynamic_columns(column_value_boolean);
CREATE INDEX ON dynamic_columns(column_value_text);
CREATE INDEX ON dynamic_columns(column_value_boolean);
....

如果可以切换到Cassandra 3.4,则有一个更好的二级索引实现,称为 SASI ,此处为创建索引的语法:

If you can switch to Cassandra 3.4, there is a better secondary index implementation called SASI, here the syntax for creating index:

// All data types EXCEPT text
CREATE CUSTOM INDEX ON types(column_value_boolean) 
USING 'org.apache.cassandra.index.sasi.SASIIndex' 
WITH OPTIONS = {'mode': 'SPARSE'};

// Text data type
CREATE CUSTOM INDEX ON types(column_value_text) 
USING 'org.apache.cassandra.index.sasi.SASIIndex' 
WITH OPTIONS = {
    'mode': 'PREFIX', 
    'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
    'case_sensitive': 'false'
};

然后,您可以轻松查询列:

Then you can query your columns easily:

//Give me col1 where value = 'val1'
SELECT * FROM dynamic_columns 
WHERE partitionKey=1 
AND column_name='col1'
AND column_value_text='val1';

//Give me 'validity_state' = true
SELECT * FROM dynamic_columns 
WHERE partitionKey=1 
AND column_name='validity_state'
AND column_value_boolean=true;

备注:您应该始终在SELECT中提供partitionKey值,否则Cassandra在最坏的情况下将执行完整的群集扫描,并会降低性能.自Cassandra以来,使用 SASI 索引3.4,此问题不太严重,但强烈建议推荐在使用二级索引时提供partitionKey

Remark: you should always provide the partitionKey value in your SELECT otherwise Cassandra will perform a full cluster scan in worst case and kill your performance. With the SASI index since Cassandra 3.4, this problem is less critical but it is still strongly recommended to provide partitionKey when using secondary index

有关分区键的重要性的更多信息,请阅读以下内容: http://www.planetcassandra.org/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key/

For more information on the importance of partition key, read this: http://www.planetcassandra.org/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key/

这篇关于需要查询到Cassandra中的列(或集合)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆