为什么 Cassandra 中的超级列不再受欢迎? [英] Why are super columns in Cassandra no longer favoured?

查看:11
本文介绍了为什么 Cassandra 中的超级列不再受欢迎?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在最新版本中读到,由于性能问题",超级列是不可取的,但没有解释.

I have read in the latest release that super columns are not desirable due to "performance issues", but no where is this explained.

然后我阅读了诸如 this one 之类的文章,这些文章提供了出色的索引使用超级列的模式.

Then I read articles such as this one that give wonderful indexing patterns using super columns.

这让我不知道什么是目前在 Cassandra 中进行索引的最佳方式.

This leave me with no idea of what is currently the best way to do indexing in Cassandra.

  1. 超级列有哪些性能问题?
  2. 在哪里可以找到当前索引编制的最佳做法?
  1. What are the performance issues of super columns?
  2. Where can I find current best practices for indexing?

推荐答案

超级列存在许多问题,其中最重要的是 Cassandra 需要在以下情况下反序列化超级列的所有子列查询(即使结果只会返回一个小的子集).因此,在性能受到影响之前,每个超级列可以存储的子列数量存在实际限制.

Super columns suffer from a number of problems, not least of which is that it is necessary for Cassandra to deserialze all of the sub-columns of a super column when querying (even if the result will only return a small subset). As a result, there is a practical limit to the number of sub-columns per super column that can be stored before performance suffers.

理论上,这可以在 Cassandra 中通过正确索引子列来解决,但大家一致认为复合列是更好的解决方案,并且它们的工作不会增加复杂性.

In theory, this could be fixed within Cassandra by properly indexing sub-columns, but consensus is that composite columns are a better solution, and they work without the added complexity.

使用复合列的最简单方法是利用 CQL 3 提供.考虑以下架构:

The easiest way to make use of composite columns is to take advantage of the abstraction that CQL 3 provides. Consider the following schema:

CREATE TABLE messages(
    username text,
    sent_at timestamp,
    message text,
    sender text,
    PRIMARY KEY(username, sent_at)
);

此处的用户名是行键,但我们使用了 PRIMARY KEY 定义,它创建了一组行键和 sent_at 列.这很重要,因为它具有索引该属性的效果.

Username here is the row key, but we've used a PRIMARY KEY definition which creates a grouping of row key and the sent_at column. This is important as it has the effect of indexing that attribute.

INSERT INTO messages (username, sent_at, message, sender) VALUES ('bob', '2012-08-01 11:42:15', 'Hi', 'alice');
INSERT INTO messages (username, sent_at, message, sender) VALUES ('alice', '2012-08-01 11:42:37', 'Hi yourself', 'bob');
INSERT INTO messages (username, sent_at, message, sender) VALUES ('bob', '2012-08-01 11:43:00', 'What are you doing later?', 'alice');
INSERT INTO messages (username, sent_at, message, sender) VALUES ('bob', '2012-08-01 11:47:14', 'Bob?', 'alice');

在幕后,Cassandra 会像这样存储上面插入的数据:

Behind the scenes Cassandra will store the above inserted data something like this:

alice: (2012-08-01 11:42:37,message): Hi yourself, (2012-08-01 11:42:37,sender): bob
bob:   (2012-08-01 11:42:15,message): Hi,          (2012-08-01 11:42:15,sender): alice, (2012-08-01 11:43:00,message): What are you doing later?, (2012-08-01 11:43:00,sender): alice (2012-08-01 11:47:14,message): Bob?, (2012-08-01 11:47:14,sender): alice

但是使用 CQL 3,我们可以使用 sent_at 谓词查询行",并返回表格结果集.

But using CQL 3, we can query the "row" using a sent_at predicate, and get back a tabular result set.

SELECT * FROM messages WHERE username = 'bob' AND sent_at > '2012-08-01';
 username | sent_at                  | message                   | sender
----------+--------------------------+---------------------------+--------
      bob | 2012-08-01 11:43:00+0000 | What are you doing later? |  alice
      bob | 2012-08-01 11:47:14+0000 |                      Bob? |  alice

这篇关于为什么 Cassandra 中的超级列不再受欢迎?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆