Cassandra模式设计 [英] Cassandra Schema Design

查看:156
本文介绍了Cassandra模式设计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我会继续探索Cassandra,并且我想建立Student< =>课程关系,类似于RDBMS上的Many-to-Many。



在查询方面,我将使用以下查询:


  1. 检索学生注册的所有课程。

  2. 检索所有在特定课程中注册的学生。

假设我创建了Column Families。

  CREATE COLUMN FAMILY student with comparator = UTF8Type AND key_validation_class = UTF8Type and column_metadata = [
{column_name:firstname,validation_class:UTF8Type}
{column_name:lastname,validation_class:UTF8Type}
{column_name:gender,validation_class:UTF8Type}


CREATE COLUMN FAMILY课程with comparator = UTF8Type AND key_validation_class = UTF8Type and column_metadata = [
{column_name:name,validation_class:UTF8Type}
{column_name:description, validation_class:UTF8Type}
{column_name:lecturer,validation_class:UTF8Type}
{column_name:assistant,validation_class:UTF8Type}];

现在我应该怎么走?



我应该使用 courseID:studentId CompisiteKey创建第三列族吗?如果是,我可以使用Hector只通过一个(左或右)复合键组件查询?



请帮助。



更新



根据建议,我创建了以下模式: p>

对于学生:

  CREATE COLUMN FAMILY student with comparator = UTF8Type and key_validation_class = UTF8Type和default_validation_class = UTF8Type; 

,然后我们将添加一些数据:

  set student ['student.1'] ['firstName'] ='Danny'
set student ['student.1'] ['lastName'] ='Lesnik '
set student ['student.1'] ['course.1'] =''
set student ['student.1'] ['course.2'] ='';

为课程创建列系列:

  CREATE COLUMN FAMILY课程,使用comparator = UTF8Type和key_validation_class = UTF8Type和default_validation_class = UTF8Type; 

添加一些数据:

  set course ['course.1'] ['name'] ='History'
set course ['course.1'] ['description'] ='History Course'
set course ['course.1'] ['name'] ='Algebra'
set course ['course.1'] ['description'] ='代数课程

和最终学生课程:

  CREATE COLUMN FAMILY StudentInCourse with comparator = UTF8Type and key_validation_class = UTF8Type and default_validation_class = UTF8Type; 

添加数据:

  set StudentInCourse ['studentIncourse.1'] ['student.1'] =''; 
set StudentInCourse ['studentIncourse.2'] ['student.1'] ='';


解决方案

我在下面定义了一个数据模型,首先决定对象模型,然后进入行模型,所以从PlayOrm的角度,你会有

  public class Student {
@NoSqlId
private String id;
private String firstName;
private String lastName;
@ManyToMany
private List< Course> courses = new ArrayList(); //构造避免nullpointers
}

public class Course {
@NoSqlId
private String id;
private String name;
private String description
@ManyToOne
private讲师讲师;
@ManyToMany
private CursorToMany students = new CursorToManyImpl();
}



我可以在课程中使用List,但我担心我可能会得到OutOfMemory if太多的学生采取了多年,多年和多年的课程。现在,让我们跳到什么PlayOrm,你可以做类似的事情,如果你喜欢



一个单一的学生行看起来像这样

  rowKey(上面实体中的id)= firstName ='dean',
lastName ='hiller'courses.rowkey56 = null,courses.78 = null, courses.98 = null,courses.101 = null

这是一个宽行, 'fieldname'和'rowkey to actual course'



课程行有点更有趣....因为用户认为加载学生的单个课程可能会导致内存不足,他使用一个光标,一次只能加载500,当你循环。



在这种情况下有两行支持课程PlayOrm将有。 Sooo,让我们把上面的用户行,他在课程rowkey56所以让我们描述那一课

  rowkey56 = name ='coursename' ,description ='somedesc',lecturer ='rowkey89ToLecturer'

索引表为学生(它是一个非常宽的行,因此支持数百万学生)

  indexrowForrowkey56InCourse = student34.56, student39.56,student.23.56 .... 
加入百万学生

如果你想要一个课程有数百万的学生,但是,你需要考虑分区是否使用playOrm。如果你需要,PlayOrm会为你分区。



注意:如果你不知道hibernate或JPA,当你加载上面的学生,它加载代理列表如果你开始循环课程,它然后回到noSQL存储和加载课程,所以你不必;)。



在课程的情况下,它会加载一个未填写的代理讲师,直到您访问一个属性字段,如lecturer.getName()。如果你调用lecturer.getId(),它不需要加载讲师,因为它已经有从课程行。



EDIT(更多细节):PlayOrm有三个索引表十进制(存储double,float等等和BigDecimal),Integer(长,短等等和BigInteger和布尔)和字符串索引表。当使用CursorToMany时,它使用这些表中的一个,取决于键的FK类型。它还使用这些表作为Scalable-SQL语言。它在CursorToMany上使用单独的行的原因是,客户端在读取行时不会得到OutOfMemory,因为在某些情况下,toMany可能有一百万个FK。



稍后,
Dean


I'm continuing exploring Cassandra and I would like to create Student <=> Course relation which is similar to Many-to-Many on RDBMS.

In term of Queries I will use the following query;

  1. Retrieve all courses in which student enrolled.
  2. Retrieve all students enrolled in specific course.

Let's say that I create to Column Families. one for Course and another for Student.

CREATE COLUMN FAMILY student with comparator = UTF8Type AND key_validation_class=UTF8Type and column_metadata=[ 
{column_name:firstname,validation_class:UTF8Type} 
{column_name:lastname,validation_class:UTF8Type}
{column_name:gender,validation_class:UTF8Type}];


CREATE COLUMN FAMILY course with comparator = UTF8Type AND key_validation_class=UTF8Type and column_metadata=[ 
{column_name:name,validation_class:UTF8Type} 
{column_name:description,validation_class:UTF8Type}
{column_name:lecturer,validation_class:UTF8Type}
{column_name:assistant,validation_class:UTF8Type}];

Now how should I move on?

Should I create third Column Family with courseID:studentId CompisiteKey? if yes, Can I use Hector to query by only one (left or right) Composite key component?

Please help.

Update:

Following the suggestion I created the following Schema:

For Student:

CREATE COLUMN FAMILY student with comparator = UTF8Type and key_validation_class=UTF8Type and default_validation_class=UTF8Type;

and then we will add some data:

set student['student.1']['firstName']='Danny'
set student['student.1']['lastName']='Lesnik'
set student['student.1']['course.1']=''
set student['student.1']['course.2']='';

Create column Family for Course:

CREATE COLUMN FAMILY course with comparator = UTF8Type and key_validation_class=UTF8Type and default_validation_class=UTF8Type;

add some data:

set course['course.1']['name'] ='History'
set course['course.1']['description'] ='History Course'
set course['course.1']['name'] ='Algebra'
set course['course.1']['description'] ='Algebra Course'

and Finally Student In Course:

CREATE COLUMN FAMILY StudentInCourse with comparator = UTF8Type and key_validation_class=UTF8Type and default_validation_class=UTF8Type;

add data:

set StudentInCourse['studentIncourse.1']['student.1'] =''; 
set StudentInCourse['studentIncourse.2']['student.1'] =''; 

解决方案

I defined a data model below but it is easier to decribe the object model first and then dive into the row model, so from PlayOrm's perspective you would have

public class Student {
  @NoSqlId
  private String id;
  private String firstName;
  private String lastName;
  @ManyToMany
  private List<Course> courses = new ArrayList(); //constructing avoids nullpointers
}

public class Course {
  @NoSqlId
  private String id;
  private String name;
  private String description
  @ManyToOne
  private Lecturer lecturer;
  @ManyToMany
  private CursorToMany students = new CursorToManyImpl();
}

I could have used List in course but I was concerned I may get OutOfMemory if too many students take a course over years and years and years. NOW, let's jump to what PlayOrm does and you can do something similar if you like

A single student row would look like so

rowKey(the id in above entity) = firstName='dean',
lastName='hiller' courses.rowkey56=null, courses.78=null, courses.98=null, courses.101=null

This is the wide row where we have many columns with the name 'fieldname' and 'rowkey to actual course'

The Course row is a bit more interesting....because the user thinks loading al the Students for a single course could cause out of memory, he uses a cursor which only loads 500 at a time as you loop over it.

There are two rows backing the Course in this case that PlayOrm will have. Sooo, let's take our user row above and he was in course rowkey56 so let's describe that course

rowkey56 = name='coursename', description='somedesc', lecturer='rowkey89ToLecturer'

Then, there is another row in the some index table for the students(it is a very wide row so supports up to millions of students)

indexrowForrowkey56InCourse = student34.56, student39.56, student.23.56.... 
into the millions of students

If you want a course to have more than millions of students though, then you need to think about partitioning whether you use playOrm or not. PlayOrm does partitioning for you if you need though.

NOTE: If you don't know hibernate or JPA, when you load the above Student, it loads a proxy list so if you start looping over the courses, it then goes back to the noSQL store and loads the Courses so you don't have to ;).

In the case of Course, it loads a proxy Lecturer that is not filled in until you access a property field like lecturer.getName(). If you call lecturer.getId(), it doesn't need to load the lecturer since it already has that from the Course row.

EDIT(more detail): PlayOrm has 3 index tables Decimal(stores double, float, etc and BigDecimal), Integer(long, short, etc and BigInteger and boolean), and String index tables. When you use CursorToMany, it uses one of those tables depending on the FK type of key. It also uses those tables for it's Scalable-SQL language. The reason it uses a separate row on CursorToMany is just so clients don't get OutOfMemory on reading a row in as the toMany could have one million FK's in it in some cases. CursorToMany then reads in batches from that index row.

later, Dean

这篇关于Cassandra模式设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆