如何理解Cassandra中的宽行概念和相关概念? [英] How to understand the concept of wide row and related concepts in Cassandra?

查看:284
本文介绍了如何理解Cassandra中的宽行概念和相关概念?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难理解 Cassandra The Definite Guide

Cassandra使用称为复合键(或复合键)的特殊主键来 代表宽行,也称为分区. 组合键由一个分区组成 键,以及一组可选的聚类列. 分区键用于确定 存储行的节点,节点本身可以​​由多列组成.这 集群列用于控制如何对数据进行排序以存储在分区中. Cassandra还支持称为静态列的其他构造,该构造为 用于存储不属于主键但被表中每一行共享的数据 分区.

Cassandra uses a special primary key called a composite key (or compound key) to represent wide rows, also called partitions. The composite key consists of a partition key, plus an optional set of clustering columns. The partition key is used to determine the nodes on which rows are stored and can itself consist of multiple columns. The clustering columns are used to control how data is sorted for storage within a partition. Cassandra also supports an additional construct called a static column, which is for storing data that is not part of the primary key but is shared by every row in a partition.

图4-5显示了如何通过分区键唯一地标识每个分区,以及 集群键如何用于唯一标识分区中的行.

Figure 4-5 shows how each partition is uniquely identified by a partition key, and how the clustering keys are used to uniquely identify the rows within a partition.

是宽行和分区同义词吗?

Are a wide row and a partition synonyms?

在分区键用于确定存储的节点上,它们本身可以由多列组成"和每个分区由分区键唯一地标识",

In "the partition key is used to determine the nodes on which rows are stored and can itself consist of multiple columns" and "each partition is uniquely identified by a partition key",

  • 由于分区键用于宽行,为什么会有多个行"(这里的行"是指宽行")?

  • since a partition key is for a wide row, why are there multiple "rows" (does "rows" here mean "wide rows")?

分区键如何确定存储的节点"?

how does the partition key "determine the nodes on which rows are stored"?

如何将分区键用于每个分区由分区键唯一标识"?

How can a partition key be used for "each partition is uniquely identified by a partition key"?

在群集列用于控制如何对数据进行排序以存储在分区中",

In "the clustering columns are used to control how data is sorted for storage within a partition",

  • 什么是聚类列,例如,图中的聚类列是什么?
  • 集群列如何控制如何对分区中存储的数据进行排序"?

在聚类键用于唯一标识分区中的行"中,

In "the clustering keys are used to uniquely identify the rows within a partition",

  • 分区是宽行的同义词,分区中的行"是什么意思?
  • 如何使用聚类键来唯一标识分区中的行"?

谢谢.

推荐答案

是宽行和分区同义词吗?

Are a wide row and a partition synonyms?

分区和行可以视为同义词.宽行是一种情况,其中所选分区键将导致该键的cells数量非常大.考虑一个场景,该场景中一个国家/地区的所有人都在使用,而分区键使用的是城市,那么一个城市将有一行,而该行中的所有人均为cells.对于都会城市,这将导致行数众多.另一个示例可以是存储每隔几秒钟收到的传感器数据,并将sensorId作为分区键,这将导致大量的cells下线.

partition and row can be considered synonym. wide row is a scenario where the chosen partition key will result in very large number of cells for that key. Consider a scenario which has all persons in a country and partition key used is city, then there will be one row for one city and all person will be cells in that row. For metro city this will lead to wide rows. Another example can be storing sensor data received every few seconds with sensorId as partition key, which will lead to huge number of cells some years down the line.

由于分区键用于宽行,为什么会有多个行" (这里的行"是指宽行")吗?

since a partition key is for a wide row, why are there multiple "rows" (does "rows" here mean "wide rows")?

与上述相同.

分区键"如何确定行所在的节点 存储"?

how does the partition key "determine the nodes on which rows are stored"?

从分区密钥哈希(默认情况下为MurMur3Hash)生成,并且cassandra中的每个节点负责值的范围.考虑到分区键值的哈希值为20,而Node1负责范围1到100,则该分区将驻留在Node1上.

From partiton key hash (MurMur3Hash is default) is generated and each node in cassandra is responsible for range of values. Consider Hash of partition key value turns out to be 20 and Node1 is responsible for range 1 to 100 then that partiton will reside on Node1.

如何将分区键用于每个分区都是唯一的 用分区键标识"?

How can a partition key be used for "each partition is uniquely identified by a partition key"?

如上所述,分区键决定了数据驻留在哪个节点上.数据表示可以视为只有唯一键的巨大映射.

As explained above partition key decides on which node the data resides.. Data representation can be considered as huge map which can have only unique keys.

什么是聚类列,例如,什么是聚类 图中的列?

what is a clustering column, for example, what are the clustering columns in the figure?

考虑一个像Create TABLE test (a text,b int, c text, PRIMARY KEY(a,b))这样创建的表,这里a是分区键,而b是聚簇列.在附图中,clustering key是聚类列,整个封闭框是单元格.

Consider a table created like Create TABLE test (a text,b int, c text, PRIMARY KEY(a,b)) here a is partition key and b is clustering column. In the figure attached clustering key is the clustering column and whole enclosing box is cell.

聚类列"如何控制存储数据的排序方式 分区中"?

How do the clustering columns "control how data is sorted for storage within a partition"?

Cassandra将使用上面示例表中的列b升序对数据进行排序.也可以将其更改为降序.

Cassandra will sort the data using column b in the above example table in ascending table. It can be changed to descending as well.

INSERT INTO test(a,b,c) VALUES('test',2,'test2')
INSERT INTO test(a,b,c) VALUES('test',1,'test1')
INSERT INTO test(a,b,c) VALUES('test-new',1,'test1')

如果您以此顺序运行上述查询,则cassandra将以以下顺序存储数据(数据表示形式远不止于此..只需检查b列的顺序即可):

If you run the above query in this order cassandra will store data in following order (Data representation has much more than below.. just check the order of column b):

test -> [b:1,c=test1] [b:2,c=test2]
test-new -> [b:1,c=test1]

分区是宽行的同义词,行"是什么意思 分区中"?

a partition is a synonym of a wide row, what does it mean by "the rows within a partition"?

集群列用于标识分区中的cells(单元格比行更好).示例SELECT * from test where a='test' and b=1将使用b:1拾取单元以进行分区键测试.

Clustering column is used to identify cells (cells is a better term than row) within a partition. example SELECT * from test where a='test' and b=1 will pick up the cell with b:1 for partiton key test.

聚类键如何用于唯一标识其中的行 一个分区"?

How "the clustering keys are used to uniquely identify the rows within a partition"?

以上答案也应对此进行解释.

Above answer should explain this as well.

这篇关于如何理解Cassandra中的宽行概念和相关概念?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆