为什么Cassandra中的超级列不再受欢迎? [英] Why are super columns in Cassandra no longer favoured?

查看:300
本文介绍了为什么Cassandra中的超级列不再受欢迎?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在最新版本中看到,由于性能问题,超级列是不可取的,但没有在这里解释。



作为这一个,使用超级列提供精彩的索引模式。



这让我不知道目前是在Cassandra中做索引的最佳方式。


  1. 超级列的性能问题是什么?

  2. 在哪里可以找到当前 >


解决方案

超级列受到许多问题的困扰,其中最重要的是Cassandra当查询时(即使结果将只返回一个小子集),反序列化一个超级列的所有子列。因此,对于每个超级列,在性能受损之前可以存储的子列的数量存在实际限制。



理论上,这可以固定在Cassandra通过适当地索引子列,但共识是复合列是一个更好的解决方案,他们工作没有增加的复杂性。



最简单的方法使用复合列是利用 CQL 3 提供的抽象。请考虑以下模式:

  CREATE TABLE消息(
用户名文本,
sent_at时间戳,
消息文本,
发送方文本,
PRIMARY KEY(用户名,sent_at)
);

此处的用户名是行键,但我们使用PRIMARY KEY定义,行键和sent_at列。这是非常重要的,因为它具有索引该属性的效果。

  INSERT INTO messages(username,sent_at,message,sender)VALUES ('bob','2012-08-01 11:42:15','Hi','alice'); 
INSERT INTO messages(username,sent_at,message,sender)VALUES('alice','2012-08-01 11:42:37','Hi yourself','bob')
INSERT INTO消息(用户名,sent_at,消息,发送者)VALUES('bob','2012-08-01 11:43:00','你后来做什么?','alice');
INSERT INTO messages(username,sent_at,message,sender)VALUES('bob','2012-08-01 11:47:14','Bob?','alice');后面的Cassandra会存储上面插入的数据,像这样:


b

  alice:(2012-08-01 11:42:37,message):你自己,(2012-08-01 11:42:37,sender ):bob 
bob:(2012-08-01 11:42:15,message):(2012-08-01 11:42:15,sender):alice, :43:00,message):你在做什么?(2012-08-01 11:43:00,sender):alice(2012-08-01 11:47:14,message):Bob? 2012-08-01 11:47:14,sender):alice

但是使用CQL 3,可以使用sent_at谓词查询row,并返回表格结果集。

  SELECT * FROM messages WHERE username = 'bob'AND sent_at> '2012-08-01'; 
username | sent_at |消息| sender
---------- + -------------------------- + ------- -------------------- + --------
bob | 2012-08-01 11:43:00 + 0000 |你以后做什么? | alice
bob | 2012-08-01 11:47:14 + 0000 |爱丽丝? | alice


I have read in the latest release that super columns are not desirable due to "performance issues", but no where is this explained.

Then I read articles such as this one that give wonderful indexing patterns using super columns.

This leave me with no idea of what is currently the best way to do indexing in Cassandra.

  1. What are the performance issues of super columns?
  2. Where can I find current best practices for indexing?

解决方案

Super columns suffer from a number of problems, not least of which is that it is necessary for Cassandra to deserialze all of the sub-columns of a super column when querying (even if the result will only return a small subset). As a result, there is a practical limit to the number of sub-columns per super column that can be stored before performance suffers.

In theory, this could be fixed within Cassandra by properly indexing sub-columns, but consensus is that composite columns are a better solution, and they work without the added complexity.

The easiest way to make use of composite columns is to take advantage of the abstraction that CQL 3 provides. Consider the following schema:

CREATE TABLE messages(
    username text,
    sent_at timestamp,
    message text,
    sender text,
    PRIMARY KEY(username, sent_at)
);

Username here is the row key, but we've used a PRIMARY KEY definition which creates a grouping of row key and the sent_at column. This is important as it has the effect of indexing that attribute.

INSERT INTO messages (username, sent_at, message, sender) VALUES ('bob', '2012-08-01 11:42:15', 'Hi', 'alice');
INSERT INTO messages (username, sent_at, message, sender) VALUES ('alice', '2012-08-01 11:42:37', 'Hi yourself', 'bob');
INSERT INTO messages (username, sent_at, message, sender) VALUES ('bob', '2012-08-01 11:43:00', 'What are you doing later?', 'alice');
INSERT INTO messages (username, sent_at, message, sender) VALUES ('bob', '2012-08-01 11:47:14', 'Bob?', 'alice');

Behind the scenes Cassandra will store the above inserted data something like this:

alice: (2012-08-01 11:42:37,message): Hi yourself, (2012-08-01 11:42:37,sender): bob
bob:   (2012-08-01 11:42:15,message): Hi,          (2012-08-01 11:42:15,sender): alice, (2012-08-01 11:43:00,message): What are you doing later?, (2012-08-01 11:43:00,sender): alice (2012-08-01 11:47:14,message): Bob?, (2012-08-01 11:47:14,sender): alice

But using CQL 3, we can query the "row" using a sent_at predicate, and get back a tabular result set.

SELECT * FROM messages WHERE username = 'bob' AND sent_at > '2012-08-01';
 username | sent_at                  | message                   | sender
----------+--------------------------+---------------------------+--------
      bob | 2012-08-01 11:43:00+0000 | What are you doing later? |  alice
      bob | 2012-08-01 11:47:14+0000 |                    Alice? |  alice

这篇关于为什么Cassandra中的超级列不再受欢迎?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆