比较Cassandra结构和关系数据库 [英] Comparing Cassandra structure with Relational Databases

查看:217
本文介绍了比较Cassandra结构和关系数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

前几天,我读到了关于宽列存储类型的NoSql和
专用的Apache-Cassandra。
我知道Cassandra包括:



一个键空间(如关系数据库中的数据库),并支持许多列族或表(与关系型数据库)和无限制的行。



来自Stackoverflow标签:


store是一种类型的键值数据库。它使用表,行和列,但与关系数据库不同,列的名称和格式在同一个表中的行与列之间可能不同。


在Cassandra中,所有行(在一个表中)应该有一个行键,然后每个行键可以有多个列。
我读了关系数据库和NoSql(Cassandra)的实现和存储数据的差异。



但我不明白结构之间的区别:假设一个场景,我有一个表(或Cassandra中的列族):



当我执行查询Cql)像这样:

 从用户选择* 

它给我的结果,你可以看到:

  lastname |年龄|城市|电子邮件
---------- + ------ + --------------- + ----------- -----------
Doe | 36 |贝弗利山| janedoe@email.com
Jones | 35 |奥斯汀| bob@example.com
Byrne | 24 |圣地亚哥| robbyrne@email.com
Smith | 46 |萨克拉门托| null
Jones2 | null |奥斯汀| bob@example.com

所以我在关系数据库(MsSql) :

  select * from [users] 

结果是:

 姓氏age city email 
Doe 36 Beverly Hills janedoe@email.com
Jones 35 Austin bob@example.com
Byrne 24 San Diego robbyrne@email.com
Smith 46 Sacramento NULL
Jones2 NULL Austin bob @ example。 com

我知道Cassandra支持动态列,我可以使用sth来执行:

  ALTER TABLE用户ADD网站varchar; 

但是它在关系模型中可用,例如在mssql中,上面的代码也可以实现。
Sth like:

  ALTER TABLE用户
ADD网站varchar(MAX)

我看到的是第一个选择和第二个选择结果是相同的。
在Cassandra中,它们只是将行键(lastname)作为独立的对象,但它与mssql(和所有关系数据库)中的唯一字段(如ID或文本)相同,我看到列的类型在Cassandra中是静态的(在我的例子中 varchar )不同于它在Stackoverflow标签中描述的




li>

那么两个结构之间有什么不同呢?


  • 是否有任何特殊的场景(Json喜欢)不能在关系数据库中实现,但Cassandra支持?知道嵌套列在Cassandra中不支持。)


  • 感谢您阅读。

    解决方案

    我们必须看看更复杂的例子才能看到差异:)






    • 在旧版Thrift API中使用列族术语

    • $ b使用术语表



    表定义为多维列族的二维视图。

    术语wide-rows主要与Thrift API相关。在cql它的定义有点不同,但下面看起来是一样的。



    比较SQL nad CQL。在SQL表中是一组行。在简单的例子中它看起来像在CQL中是一样的,但它不是。 CQL表是一组分区,其中每个分区只能是单个行(例如,当您没有聚类键)或多行。包含多行的分区在Thrift therminology中名为wide-row。要查看其下方的存储情况,请参阅部分关于此处的复合键。



    还有更多的区别:




    • CQL可以具有存储在分区级别的静态列 - it
      似乎分区中的每一行都有一个公共值,但真的
      是存储在上一级的单个值。也可以用于模型1:N关系

    • 在CQL中,您可以有集合类型列 - set,list,map

    • 一个用户定义的类型(你可以定义例如 address 作为类型,并在许多地方重用这个类型),或者集合
      可以是用户定义类型的集合

    • 但是CQL不支持SQL中可用的JOIN,并且您必须非常仔细地构造表,因为它们必须严格以查询为导向(在cassandra中,您可以通过任何
      列值查询数据,二级索引也有很多限制)。它是
      通常说,在关系模型中,你的模型表清楚地基于
      对数据,当在cassandra你模型基于查询。



    我希望我能够让你更清楚一点。我建议您从 Datastax核心概念课程观看一些视频(或阅读幻灯片)作为Cassandra的坚实介绍。


    A few days ago I read about wide-column stored type of NoSql and exclusively Apache-Cassandra. What I understand is that Cassandra consist of :

    A keyspace(like database in relational databases) and supporting many column families or tables (Same as table in relational databases) and unlimited rows.

    From Stackoverflow tags :

    A wide column store is a type of key-value database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table.

    In Cassandra all of the rows (in a table) should have a row key then each row key can have multiple columns. I read about differences in implementation and storing data of Relational database and NoSql (Cassandra) .

    But I don't understand the difference between structure :

    Imagine a scenario which I have a table (or column family in Cassandra) :

    When I execute a query (Cql) like this :

    Select * from users;
    

    It gives me the result as you can see :

    lastname | age  | city          | email               
    ----------+------+---------------+----------------------
          Doe |   36 | Beverly Hills |   janedoe@email.com       
        Jones |   35 |        Austin |     bob@example.com        
        Byrne |   24 |     San Diego |  robbyrne@email.com         
        Smith |   46 |    Sacramento |   null                      
      Jones2  | null |        Austin |     bob@example.com       
    

    So I perform the above scenario in relational database (MsSql) with the blow query :

    select * from [users] 
    

    And the result is :

    lastname    age      city              email                    
        Doe     36       Beverly Hills     janedoe@email.com          
        Jones   35       Austin            bob@example.com             
        Byrne   24       San Diego         robbyrne@email.com         
        Smith   46       Sacramento        NULL                 
       Jones2   NULL     Austin            bob@example.com              
    

    I know that Cassandra supports dynamic column and I can perform this by using sth like :

    ALTER TABLE users ADD website varchar;
    

    But it is available in relational model for example in mssql the above code can be implemented too. Sth like :

    ALTER TABLE users 
    ADD website varchar(MAX) 
    

    What I see is that the first select and second select result is the same. In Cassandra , they just give a row key (lastname) as a standalone objet but it is same as a unique field (like ID or a text) in mssql (and all relational databases) and I see the type of column in Cassandra is static (in my example varchar) unlike what it describes in Stackoverflow tag.

    So my questions is :

    1. Is there any misunderstanding in my imagination about Cassandra?!

    2. So what is different between two structure ?! I show you the result is same.

    3. Is there any special scenarios (Json like) that cannot be implemented in relational databases but Cassandra supports ?( For example I know that nested column doesn't support in Cassandra.)

    Thank you for reading.

    解决方案

    We have to look at more complex example to see the differences :)

    For start:

    • column family term was used in older Thrift API
    • in newer CQL API, the term table is used

    Table is defined as "two-dimensional view of a multi-dimensional column family".

    The term "wide-rows" was related mainly to the Thrift API. In cql it is defined a bit differently, but underneath looks the same.

    Comparing SQL nad CQL. In SQL table is a set of rows. In simple example it looks like in CQL it is the same, but it is not. CQL table is a set of partitions, where each partition can be just a single row (e.g. when you don't have a clustering key) or multiple rows. Partition containing multiple rows is in Thrift therminology named "wide-row". To see how it is stored underneath, please read e.g. part about composite-keys from here.

    There are more differences:

    • CQL can have static columns which are stored on partition level - it seems that every row in partition have a common value, but really it is a single value stored on upper level. It can be used also to model 1:N relations
    • In CQL you can have collection type columns - set, list, map
    • Column can contain a user defined type (you can define e.g. address as type, and reuse this type in many places), or collection can be a collection of user defined types
    • But also CQL does not support JOINs which are available in SQL, and you have to structure your tables very carefully, since they have to be strictly query oriented (in cassandra you can't query data by any column value, secondary indexes also have many limitations). It is usually said that in relational model you model tables clearly basing on data, when in cassandra you model basing on queries.

    I hope I was able to make it a bit more clear for you. I recommend watching some vidoes (or reading slides) from Datastax Core Concepts Course as solid introduction to Cassandra.

    这篇关于比较Cassandra结构和关系数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆