Cassandra数据库数据模型:批评我的架构设计 [英] Cassandra Database Data Model: Critic my schema design

查看:188
本文介绍了Cassandra数据库数据模型:批评我的架构设计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为测试系统实现一个数据库。它旨在存储测试数据以供将来进行统计分析。它为Cassandra为基础。



我设计了一个模式,但由于这是我第一次尝试NoSQL设计,我想得到一些反馈。



我将首先描述我想保存的数据,然后描述两个基本查询,最后介绍我的建议设计。



我打算使用Cassandra 1.1所以我试图在我的设计中使用复合列,但是,请随意提出超级列或似乎是正确的。



数据:



我们正在测试的基本单位是外星人。每个外星人都有一个唯一的ID。每个外星人都有一些身材。此外,每个外国人都是一群外侨的一部分。这些家庭有独特的名称。



当我们进行测试时,我们将其运行在外星人群体的几个座椅上。例如,我们采取几个家庭,并对所有的眼睛和嘴巴进行测试。



有几种测试。我们使用自己的测试唯一ID记录每个测试。



当我们进行测试时,我们会每隔几分钟对所有相关的外来人员进行抽样,并收集一些统计信息。



基本查询




  1. 每个家庭或外来或独特的bodypart - 测试它参与。

  2. <每个测试ID - 哪些家庭或外国人或独特的鞋底参与其中。
  3. 将来对所有数据的统计分析...



我的设计尝试:



  GeneralAliensData:{// Column Family - 关于外国人的一般资料。 
[FamilyID] [AlienID] [Bodypart]:{//复合列作为行键
种族:'Blurgons'//列
鞋子:5 //列
收藏夹:'fun,toy'//列
}
}

测试数据:{//列系列 - 每隔几分钟抽样一次测试...
[TestID] [AlienID] [Bodypart] [MinutesFromTestStart]:{//复合列为Rowkeys
温度:30 //列
大小:5 //列
}
}


BodypartTestParticipation:{//列族 - 所有测试都是唯一的bodypart传递...
[FamilyID] [AlienID] [Bodypart]:{//复合列为行键
TestID:105 //列
TestID:564 //列
...
}
}

就是这样。由于我是数据库中的真正的初学者,特别是Cassandra,所以我很感激任何意见。



谢谢你的时间。

解决方案

您的数据集最终会在行中有多大?我们使用PlayOrm来存储noSQL中的关系数据,这些数据有时候很好,表可以进入X数百万行。如果你进入数十亿行,那么我们使用PlayOrm对相同的数据进行分区,以便缩放。



那么你需要扩展的能力吗?您可能想查看宽行模式(PlayOrm大量使用)。宽行可以帮助您索引非常快速的查找。



我真的没有得到这部分的东西

  TestsData:{//列族 - 每隔几分钟我们对每个测试进行抽样... 
[TestID] [AlienID] [Bodypart] [MinutesFromTestStart]:{/ /复合列为行键
温度:30 //列
大小:5 //列
}
}

这里不应该有更宽的一行吗?其中testid是行键,并且您有其他数据的许多复合名称?并且宽行不应该大于1000万列,所以确保没有测试数据行会超过。所以宽行可能是



testid - > alienId:fk23 = null,alienId:fk25 = null等等。temperature = 30,size = 5


以后,
Dean


I need to implement a database for a testing system. It is designed to store test data for future statistical analysis. It has to be Cassandra based.

I've designed a schema, but since this is my first attempt at NoSQL design, I would like to get some feedback.

I will first describe the data I wish to save, then describe two basic queries and finally present my suggested design.

I intend on using Cassandra 1.1 so I tried to use Composite Columns in my design, however, feel free to suggest super columns or what ever seems right.

Data:

The basic unit we are testing is an alien. Each alien has a unique ID. Each alien has a number of bodyparts. Also, each alien is part of a family of aliens. The families have unique names.

When we run a test, we run it on a few bodyparts of an alien group. For example, we take a few families and run a test on all of their eyes and mouths.

There are a few kind of tests. We log each test with it's own test unique ID.

When we run a test, we sample all relevant alien bodyparts every couple of minutes and gather some statistics.

Basic Queries:

  1. Per each family or alien or unique bodypart - which tests it participated in.
  2. Per each test ID - which families or aliens or unique bodyparts participated in it.
  3. In the future, statistical analysis of all data...

My attempt at design:

GeneralAliensData : { // Column Family  - general data on aliens. 
    [FamilyID][AlienID][Bodypart] : { //Composite Columns as Row keys
        Race: 'Blurgons' // column
        Shoesize: 5 // column
        Favorite probe: 'fun, toy' // column
    }  
}

TestsData : { // Column Family - we sample each test every couple of minutes...
    [TestID][AlienID][Bodypart][MinutesFromTestStart]: { //Composite Columns as Rowkeys
        Temperture: 30 // column
        Size: 5 // column
    }  
}


BodypartTestParticipation : { // Column Family - all the tests a unique bodypart passed...
    [FamilyID][AlienID][Bodypart]: { //Composite Columns as Row keys
        TestID: 105 // column
        TestID: 564 // column
        ...
    }  
}

This is it. Since I'm a real beginner in databases and Cassandra in particular, I'd appreciate any input.

Thank you for your time.

解决方案

How large will your dataset eventually be in rows? We use PlayOrm to store relational data in noSQL sometimes which works great and tables can go into the X millions of rows. If you are going into the billions/trillions of rows, then we use PlayOrm to partition the same data so it scales.

So, do you need ability to scale? You may want to check out the wide row pattern(PlayOrm makes heavy use of that). Wide rows can help you index stuff for very fast lookups.

I really don't get this part of your stuff

TestsData : { // Column Family - we sample each test every couple of minutes...
    [TestID][AlienID][Bodypart][MinutesFromTestStart]: { //Composite Columns as Rowkeys
        Temperture: 30 // column
        Size: 5 // column
    }  
}

Shouldn't it be more a wide row here? where testid is the row key and you have many composite names for the other data? and wide rows should not be larger than 10 million columns so make sure no test data rows would go over that. So a wide row might be

testid -> alienId:fk23=null, alienId:fk25=null, etc. etc. temperture=30, size=5

later, Dean

这篇关于Cassandra数据库数据模型:批评我的架构设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆