Cassandra 1.1存储引擎如何存储复合材料? [英] Cassandra 1.1 storage engine how does it store composites?

查看:201
本文介绍了Cassandra 1.1存储引擎如何存储复合材料?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解Cassandra的存储引擎,当涉及到复合列。不幸的是,我到目前为止阅读的文档包含错误,并留给我一些空白。



首先,术语。


复合列包含使用
复合主键的完全非标准化的宽行。


这似乎是误导性的,因为AFAIK,复合列可以用于复合键,也可以简单地作为复合列除了键。



1:如何实现复合键和列名称?我可以找到的每个CQL示例只显示复合键作为列,而不是纯复合列。



假设我们有列'a','b','c' 'd'作为主要复合键+列'e','f'。我知道'a'将是行和分区键。



让我们假设以下数据:

  abcdef 
1a 1b 1c 1d e1 f1
1a 1b 1c 2d e1 f2
1a 1b 1c 2d e2 f3
2a 2b 2c 2d e2 f4

2:我想这里真正的问题是如何'b','c','d'映射,因为列不是定义层次。



3: 我读过的文档说明不应再使用紧凑存储。但是,如果不需要添加非主键列,那么会是什么原因呢?

解决方案

blockquote>

1:如何实现复合键和列名称?


大部分回答问题 2 。另外,在Cassandra 1.2中,非复合键也将作为复合键在引擎盖下实现。此外,复合列的名称本身不会在存储中重复。内存中的表示实现了高达内存效率阈值的名字。


2: p>

第一个关键组件(在您的示例中 a )成为物理行键。其余的列形成非复合列的前缀,并在一行中存储为预先排序(聚类)。因此,您的示例的物理表示将是这样:

  1b.1c.1d,e 1b.1c.1d,f 
1a e1 f1
------------------------------
2b.2c.2d ,e 2b.2c.2d,f
2a e2 f4

请注意,您示例中的第三行无效



我使用的点符号( 1b.1c.1d )是比喻。实际存储使用元数据的前缀字节,后跟数据。


我读的文档说明不应再使用紧凑存储。但是如果不需要添加非主键列,那么会是什么原因呢?


非常小的存储效率不值得你的模式中没有可演化性的缺点。


I'm trying to understand Cassandra's storage engine when it comes to composite columns. Unfortunately, the documentation I've read so far contains errors and is leaving me a bit blank.

First, terminology.

Composite columns comprise fully denormalized wide rows by using composite primary keys.

This seems misleading because, AFAIK, composite columns can be used for composite keys, and also simply as composite columns apart from keys.

1: How are composite keys and column names implemented? Every CQL example I can find only shows composite keys as columns, not plain composite columns.

Let's say we have columns 'a', 'b', 'c', 'd' as primary composite key + columns 'e', 'f'. I know 'a' will be the row and partition key.

Let's suppose the following data:

a    b    c    d    e    f
1a   1b   1c   1d   e1   f1
1a   1b   1c   2d   e1   f2
1a   1b   1c   2d   e2   f3
2a   2b   2c   2d   e2   f4

2: How is this stored under the hood? I suppose the real question here is how is 'b', 'c', 'd' mapped out since columns are not hierarchical by definition.

3: The documentation I read says compact storage should no longer be used. But what if non-primary key columns don't need to be added... what's the reason not to use it then?

解决方案

1: How are composite keys and column names implemented?

Mostly answered with question 2. As an aside, in Cassandra 1.2, non-composite keys will also be implemented as composite keys under the hood. Also, the names themselves of composite columns are not repeated in storage. The in-memory representation interns the names up to a threshold for memory efficiency.

2: How is this stored under the hood?

The first key component (a in your example) becomes the physical row key. Rest of the columns form the prefix for non-composite columns and are stored presorted (clustered) within a row. So, physical representation for your example will be like this:

    1b.1c.1d, e   1b.1c.1d, f
1a      e1            f1
------------------------------
    2b.2c.2d, e   2b.2c.2d, f
2a      e2            f4

Note that the second and third rows in your example are not valid. Column names must be unique within physical rows.

The dot notation I used (1b.1c.1d) is figurative. Actual storage uses prefix bytes for metadata followed by data.

The documentation I read says compact storage should no longer be used. But what if non-primary key columns don't need to be added... what's the reason not to use it then?

The very small storage efficiency is not worth the downside of not having evolvability in your schema.

这篇关于Cassandra 1.1存储引擎如何存储复合材料?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆