为什么很多人将Cassandra称为面向列的数据库? [英] Why many refer to Cassandra as a Column oriented database?

查看:117
本文介绍了为什么很多人将Cassandra称为面向列的数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在互联网上阅读几篇论文和文档,我发现了许多关于Cassandra数据模型的矛盾信息。有许多将它标识为面向列的数据库,其他作为行方向,然后将其定义为两者的混合方式。



根据我知道的关于Cassandra如何存储文件,它使用* -Index.db文件访问* -Data.db文件的正确位置,其中存储bloom过滤器,列索引,然后是所需行的列。 p>

在我看来,这是严格按行。是否有我遗漏的东西?

解决方案

是的,以列为导向的术语有点令人困惑。 >

Cassandra中的模型是行包含列。要访问最小的数据单元(列),您必须先指定行名(键),然后指定列名。



因此,在一个名为 Fruit 你可以有一个结构,如下面的例子(2行),其中水果类型是行键,每列都有一个名称和值。

  apple  - >颜色重量价格品种
红色100 40Cox

橙色 - >颜色重量价格原产地
orange120 50西班牙

基于关系数据库是一个可以省略列(橙色没有品种),或任何时候添加任意列(橙色有原点)。你仍然可以把上面的数据想象成一个表,虽然是稀疏的,其中很多值可能是空的。



然而,一个面向列的模型也可以使用列表和时间序列,其中每个列名称是唯一的(这里我们只有一行,但我们可以有成千上万的列):

  temperature  - > 2012-09-01 2012-09-02 2012-09-03 ... 
40 41 39 ...

这与关系模型非常不同,其中一个人必须将时间序列的条目建模为 rows 而不是


Reading several papers and documents on internet, I found many contradictory information about the Cassandra data model. There are many which identify it as a column oriented database, other as a row-oriented and then who define it as a hybrid way of both.

According to what I know about how Cassandra stores file, it uses the *-Index.db file to access at the right position of the *-Data.db file where it is stored the bloom filter, column index and then the columns of the required row.

In my opinion, this is strictly row-oriented. Is there something I'm missing?

解决方案

Yes, the "column-oriented" terminology is a bit confusing.

The model in Cassandra is that rows contain columns. To access the smallest unit of data (a column) you have to specify first the row name (key), then the column name.

So in a columnfamily called Fruit you could have a structure like the following example (with 2 rows), where the fruit types are the row keys, and the columns each have a name and value.

apple -> colour  weight  price variety
         "red"   100     40    "Cox"

orange -> colour    weight  price  origin
          "orange"  120     50     "Spain"

One difference from a table-based relational database is that one can omit columns (orange has no variety), or add arbitrary columns (orange has origin) at any time. You can still imagine the data above as a table, albeit a sparse one where many values might be empty.

However, a "column-oriented" model can also be used for lists and time series, where every column name is unique (and here we have just one row, but we could have thousands or millions of columns):

temperature ->  2012-09-01  2012-09-02  2012-09-03 ...
                40          41          39         ...

which is quite different from a relational model, where one would have to model the entries of a time series as rows not columns.

这篇关于为什么很多人将Cassandra称为面向列的数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆