Cassandra数据模型简单的消息应用程序 [英] Cassandra data model for simple messaging app

查看:121
本文介绍了Cassandra数据模型简单的消息应用程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图学习Cassandra,总是找到最好的方法是从创建一个非常简单和小的应用程序开始。因此,我创建一个基本的消息传递应用程序,将使用Cassandra作为后端。我想执行以下操作:




  • 用户将创建一个包含用户名,电子邮件和密码的帐户。
    电子邮件和密码可以随时更改。

  • 用户可以添加其他用户作为他们的联系人。用户将通过搜索用户名或电子邮件来添加
    联系人。如果我添加一个用户他们是我的联系人,我不需要
    联系人不需要
    相互含义,我不
    需要等待他们接受/批准像Facebook中的任何东西。

  • 一个消息从一个用户发送到另一个用户。发件人需要
    才能看到他们发送的消息(按时间排序)和发送给他们的
    消息(按时间排序)。当用户打开
    应用程序时,我需要检查数据库的
    用户的任何新消息。



当我来自关系数据库的世界时,我的关系数据库会查找一些东西像这样:

  UsersTable 
用户名(文本)
电子邮件text)
time_created(timestamp)
last_loggedIn(timestamp)
---------------------------- --------------------
ContactsTable
user_i_added(text)
user_added_me(text)
----- -------------------------------------------
MessagesTable
$ from $ user(text)
to_user(text)
msg_body(text)
metadata(text)
has_been_read(boolean)
message_sent_time(timestamp)

阅读一些Cassandra教科书我想到了如何建模数据库。我主要关注的是以非常有效的方式对数据库建模。因此我试图避免诸如二级索引之类的事情。这是我的模型到目前为止:

  CREATE TABLE users_by_username $ b username text PRIMARY KEY,
电子邮件文本,
密码文本
timeCreated时间戳
last_loggedin时间戳


CREATE TABLE users_by_email b $ b电子邮件文本PRIMARY KEY,
用户名文本,
密码文本
timeCreated时间戳
last_loggedin时间戳

/ pre>

要均匀传播数据并读取最少量的分区(希望只有一个),我可以根据用户名或电子邮件快速查找用户。这样做的缺点显然是我的数据加倍,但存储成本相当便宜,所以我觉得它是一个很好的折衷,而不是使用次级索引。上次登录还需要写两次,但Cassandra在写作效率高,所以我相信这是一个很好的权衡。



对于联系人,我不能想到任何其他方式来建模,所以我建模非常类似于我在一个关系数据库。这是一个非正规化的设计,我beleive,应该是良好的性能根据我读过的书。

  CREATE TABLEuser_follows (
follower_username text,
followed_username text,
timeCreated timestamp,
PRIMARY KEY(follower_username,followed_username)
);

CREATE TABLEuser_followedBy(

followed_username text,
follower_username text,
timeCreated timestamp,
PRIMARY KEY(followed_username follower_username)
);

我被困在如何创建下一部分。对于消息我想到这个表,因为它创建了宽行,使消息排序。
我需要消息来回答两个问题。它首先需要能够向用户显示他们拥有的所有消息,并且能够向用户显示新的和未读的消息
。这是一个基本模型,但不确定如何使其更高效。

  CREATE TABLE消息(
message_id uuid ,
from_user text,
to_user text,
正文文本,
hasRead布尔值,
timeCreated timeuuid,
PRIMARY KEY((to_user),timeCreated)
)WITH CLUSTERING ORDER BY(timeCreated ASC);

我还在使用STATIC列等工具将用户和消息以及SETS存储联系人的关系,但从我狭隘的理解,到目前为止我提出的方式是更有效率。我问如果有任何想法来提高这个模型的效率,如果有更好的做法做我想做的事情,或者如果有任何隐藏的问题,我可以面对这个设计?



总之,我试图围绕查询进行建模。如果我使用关系数据库,这些本质上是我想要回答的查询:

 登录:
SELECT * FROM USERS WHERE(USERNAME = [MY_USERNAME] OR EMAIL = [MY_EMAIL])AND PASSWORD = [MY_PASSWORD];
---------------------------------------------- -------------------------------------------------- ------------------------
更新用户信息:
UPDATE USERS(密码)SET password = [NEW_PASSWORD]其中username = [MY_USERNAME];
UPDATE USERS(email)SET password = [NEW_PASSWORD] where username = [MY_USERNAME];
---------------------------------------------- -------------------------------------------------- ------------------------
添加联系人(如果通过用户名):
INSERT INTO以下(以下,跟随者)VALUES ([USERNAME_I_WANT_TO_FOLLOW],[MY_USERNAME]);
---------------------------------------------- -------------------------------------------------- ------------------------
添加联系人(如果通过电子邮件):
SELECT username FROM users where email = [CONTACTS_EMAIL ];
然后应用层使用用户名发送另一个查询:
INSERT INTO以下(以下,跟随者)VALUES([USERNAME_I_WANT_TO_FOLLOW],[MY_USERNAME]);
---------------------------------------------- -------------------------------------------------- ------------------------
查看联系人:
选择以下FROM USERS WHERE follower = [MY_USERNAME];
---------------------------------------------- ---------------------------------- ------------------------
发送消息:,
INSERT INTO MESSAGES(MSG_ID,FROM,TO,MSG,IS_MSG_NEW )VALUES(uuid,[FROM_USERNAME],[TO_USERNAME],'MY MSG',true);
---------------------------------------------- -------------------------------------------------- ------------------------
查看所有消息(一些分页类型的技术,其中显示10个最近的消息,但显示哪些未读):
SELECT * FROM MESSAGES WHERE TO = [MY_USERNAME] LIMIT 10;
---------------------------------------------- -------------------------------------------------- ------------------------
一旦读取消息:
UPDATE MESSAGES SET IS_MSG_NEW = false WHERE TO = [MY_USERNAME] AND MSG_ID = [MSG_ID];

干杯

解决方案

是的,当从关系数据库背景来看时,总是很难适应Cassandra的限制。因为我们还没有在Cassandra中加入的奢侈,你经常想尽可能多地在一张桌子上挤。在您的情况下将是users_by_username表。



Cassandra的一些功能应该允许您这样做。



由于你是Cassandra的新手,你可能使用Cassandra 3.0,目前正在测试版。在3.0有一个很好的功能,称为物化视图。这将允许您将users_by_username作为基表,并将users_by_email创建为实例化视图。



另一个功能将帮助你是用户定义的类型(在C * 2.1和更高版本中)。



因此,一个简化的表格可以用来表示这些类型的列表,而不是为跟随者和消息创建单独的表。你的模式的视图可能是这样的(我不显示一些字段,如时间戳保持这个简单,但这些都很容易添加)。



您的UDT的:

  CREATE TYPE user_follows(
followed_username text,
street text,
) ;

CREATE TYPE msg(
from_user text,
body text
);

接下来,我们创建您的基表:

  CREATE TABLE users_by_username(
username text PRIMARY KEY,
电子邮件文本,
密码文本,
在列表之后< frozen< user_follows> ,
followed_by list< frozen< user_follows>>,
new_messages list< frozen< msg>> ;,
old_messages list< frozen< msg>
);

现在我们创建一个通过电子邮件分区的物化视图:

  CREATE MATERIALIZED VIEW users_by_email AS 
SELECT用户名,密码,跟随,new_messages,old_messages FROM users_by_username
WHERE电子邮件IS NOT NULL并且密码不为NULL, IS NOT NULL and new_messages IS NOT NULL
PRIMARY KEY(email,username);

现在让我们来看看它能做什么。让我们创建一个用户:

  INSERT INTO users_by_username(用户名,电子邮件,密码)
VALUES('someuser' someemail@abc.com','somepassword');

让用户关注其他用户:

  UPDATE users_by_username SET follows = [{followed_username:'followme2',street:'mystreet2'}] +跟随
WHERE username ='someuser';

让我们向用户发送一条消息:

  UPDATE users_by_username SET new_messages = [{from_user:'auser',body:'hi someuser!'}] + new_messages 
WHERE username ='someuser';

现在让我们看看表中的内容:

  SELECT * FROM users_by_username; 

用户名|电子邮件| follow_by |遵循| new_messages | old_messages |密码
---------- + ------------------- + ------------- + -------------------------------------------------- ------- + ------------------------------------------ ---- + -------------- + --------------
someuser | someemail@abc.com | null | [{followed_username:'followme2',street:'mystreet2'}] | [{from_user:'auser',body:'hi someuser!'}] | null | somepassword

现在让我们检查一下我们的物化视图是否正常工作:

  SELECT new_messages,old_messages FROM users_by_email WHERE email='someemail@abc.com'; 

new_messages | old_messages
--------------------------------------------- - + --------------
[{from_user:'auser',body:'hi someuser!'}] | null

现在让我们读取电子邮件并将其放入旧邮件:



BEGIN BATCH
DELETE new_messages [0] FROM users_by_username WHERE username ='someuser'
UPDATE users_by_username SET old_messages = [{from_user:' auser',body:'hi someuser!'}] + old_messages其中username ='someuser'
APPLY BATCH;

SELECT new_messages,old_messages FROM users_by_email WHERE email='someemail@abc.com';

new_messages | old_messages
-------------- + ------------------------------ ----------------
null | [{from_user:'auser',body:'hi someuser!'}]

你可以使用一些想法。请查看关于集合的文档(即列表,地图和集合),因为这些可以真正帮助您在一个表中保存更多信息,并且类似表中的表。


I am trying to learn Cassandra and always find the best way is to start with creating a very simple and small application. Hence I am creating a basic messaging application which will use Cassandra as the back-end. I would like to do the following:

  • User will create an account with a username, email, and password. The email and the password can be changed at anytime.
  • The user can add another user as their contact. The user would add a contact by searching their username or email. The contacts don't need to be mutual meaning if I add a user they are my contact, I don't need to wait for them to accept/approve anything like in Facebook.
  • A message is sent from one user to another user. The sender needs to be able to see the messages they sent (ordered by time) and the messages which were sent to them (ordered by time). When a user opens the app I need to check the database for any new messages for that user. I also need to mark if the message has been read.

As I come from the world of relational databases my relational database would look something like this:

UsersTable
    username (text)
    email (text)
    password (text)
    time_created (timestamp)
    last_loggedIn (timestamp)
------------------------------------------------ 
ContactsTable
    user_i_added (text)
    user_added_me (text)
------------------------------------------------     
MessagesTable
    from_user (text)
    to_user (text)
    msg_body (text)
    metadata (text)
    has_been_read (boolean)
    message_sent_time (timestamp)

Reading through a couple of Cassandra textbooks I have a thought of how to model the database. My main concern is to model the database in a very efficient manner. Hence I am trying to avoid things such as secondary indexes etc. This is my model so far:

CREATE TABLE users_by_username (
    username text PRIMARY KEY,
    email text,
    password text
    timeCreated timestamp
    last_loggedin timestamp
)

CREATE TABLE users_by_email (
    email text PRIMARY KEY,
    username text,
    password text
    timeCreated timestamp
    last_loggedin timestamp
)

To spread data evenly and to read a minimal amount of partitions (hopefully just one) I can lookup a user based on their username or email quickly. The downside of this is obviously I am doubling my data, but the cost of storage is quite cheap so I find it to be a good trade off instead of using secondary indexes. Last logged in will also need to be written in twice but Cassandra is efficent at writes so I believe this is a good tradeoff as well.

For the contacts I can't think of any other way to model this so I modelled it very similar to how I would in a relational database. This is quite a denormalized design I beleive which should be good for performance according to the books I have read?

CREATE TABLE "user_follows" (
  follower_username text,
  followed_username text,
  timeCreated timestamp, 
  PRIMARY KEY ("follower_username", "followed_username")
);

CREATE TABLE "user_followedBy" (

  followed_username text,
  follower_username text,
  timeCreated timestamp,
  PRIMARY KEY ("followed_username", "follower_username")
);

I am stuck on how to create this next part. For messaging I was thinking of this table as it created wide rows which enables ordering of the messages. I need messaging to answer two questions. It first needs to be able to show the user all the messages they have and also be able to show the user the messages which are new and are unread. This is a basic model, but am unsure how to make it more efficent?

CREATE TABLE messages (
    message_id uuid,
    from_user text,
    to_user text,
    body text,
    hasRead boolean,
    timeCreated timeuuid,
    PRIMARY KEY ((to_user), timeCreated )
) WITH CLUSTERING ORDER BY (timeCreated ASC);

I was also looking at using things such as STATIC columns to 'glue' together the user and messages, as well as SETS to store contact relationships, but from my narrow understanding so far the way I presented is more efficient. I ask if there are any ideas to improve this model's efficiency, if there are better practices do the things I am trying to do, or if there are any hidden problems I can face with this design?

In conclusion, I am trying to model around the queries. If I were using relation databases these would be essentially the queries I am looking to answer:

To Login:
SELECT * FROM USERS WHERE (USERNAME = [MY_USERNAME] OR EMAIL = [MY_EMAIL]) AND PASSWORD = [MY_PASSWORD];
------------------------------------------------------------------------------------------------------------------------
Update user info:
UPDATE USERS (password) SET password = [NEW_PASSWORD] where username = [MY_USERNAME];
UPDATE USERS (email) SET password = [NEW_PASSWORD ] where username = [MY_USERNAME];
------------------------------------------------------------------------------------------------------------------------ 
To Add contact (If by username):
INSERT INTO followings(following,follower)  VALUES([USERNAME_I_WANT_TO_FOLLOW],[MY_USERNAME]);
------------------------------------------------------------------------------------------------------------------------
To Add contact (If by email):
SELECT username FROM users where email = [CONTACTS_EMAIL];
    Then application layer sends over another query with the username:
INSERT INTO followings(following,follower)  VALUES([USERNAME_I_WANT_TO_FOLLOW],[MY_USERNAME]);
------------------------------------------------------------------------------------------------------------------------
To View contacts:
SELECT following FROM USERS WHERE follower = [MY_USERNAME];
------------------------------------------------------------------------------------------------------------------------
To Send Message:,
INSERT INTO MESSAGES (MSG_ID, FROM, TO, MSG, IS_MSG_NEW) VALUES (uuid, [FROM_USERNAME], [TO_USERNAME], 'MY MSG', true);
------------------------------------------------------------------------------------------------------------------------
To View All Messages (Some pagination type of technique where shows me the 10 recent messages, yet shows which ones are unread):
SELECT * FROM MESSAGES WHERE TO = [MY_USERNAME] LIMIT 10;
------------------------------------------------------------------------------------------------------------------------
Once Message is read:
UPDATE MESSAGES SET IS_MSG_NEW = false WHERE TO = [MY_USERNAME] AND MSG_ID = [MSG_ID];

Cheers

解决方案

Yes it's always a struggle to adapt to the limitations of Cassandra when coming from a relational database background. Since we don't yet have the luxury of doing joins in Cassandra, you often want to cram as much as you can into a single table. In your case that would be the users_by_username table.

There are a few features of Cassandra that should allow you to do that.

Since you are new to Cassandra, you could probably use Cassandra 3.0, which is currently in beta release. In 3.0 there is a nice feature called materialized views. This would allow you to have users_by_username as a base table, and create the users_by_email as a materialized view. Then Cassandra will update the view automatically whenever you update the base table.

Another feature that will help you is user defined types (in C* 2.1 and later). Instead of creating separate tables for followers and messages, you can create the structure of those as UDT's, and then in the user table keep lists of those types.

So a simplified view of your schema could be like this (I'm not showing some of the fields like timestamps to keep this simple, but those are easy to add).

First create your UDT's:

CREATE TYPE user_follows (
    followed_username text,
    street text,
);

CREATE TYPE msg (
    from_user text,
    body text
);

Next we create your base table:

CREATE TABLE users_by_username (
    username text PRIMARY KEY,
    email text,
    password text,
    follows list<frozen<user_follows>>,
    followed_by list<frozen<user_follows>>,
    new_messages list<frozen<msg>>,
    old_messages list<frozen<msg>>
);

Now we create a materialized view partitioned by email:

CREATE MATERIALIZED VIEW users_by_email AS
    SELECT username, password, follows, new_messages, old_messages FROM users_by_username
    WHERE email IS NOT NULL AND password IS NOT NULL AND follows IS NOT NULL AND new_messages IS NOT NULL
    PRIMARY KEY (email, username);

Now let's take it for a spin and see what it can do. Let's create a user:

INSERT INTO users_by_username (username , email , password )
    VALUES ( 'someuser', 'someemail@abc.com', 'somepassword');

Let the user follow another user:

UPDATE users_by_username SET follows = [{followed_username: 'followme2', street: 'mystreet2'}] + follows
    WHERE username = 'someuser';

Let's send the user a message:

UPDATE users_by_username SET new_messages = [{from_user: 'auser', body: 'hi someuser!'}] + new_messages
    WHERE username = 'someuser';

Now let's see what's in the table:

SELECT * FROM users_by_username ;

 username | email             | followed_by | follows                                                 | new_messages                                 | old_messages | password
----------+-------------------+-------------+---------------------------------------------------------+----------------------------------------------+--------------+--------------
 someuser | someemail@abc.com |        null | [{followed_username: 'followme2', street: 'mystreet2'}] | [{from_user: 'auser', body: 'hi someuser!'}] |         null | somepassword

Now let's check that our materialized view is working:

SELECT new_messages, old_messages FROM users_by_email WHERE email='someemail@abc.com'; 

 new_messages                                 | old_messages
----------------------------------------------+--------------
 [{from_user: 'auser', body: 'hi someuser!'}] |         null

Now let's read the email and put it in the old messages:

BEGIN BATCH
    DELETE new_messages[0] FROM users_by_username WHERE username='someuser'
    UPDATE users_by_username SET old_messages = [{from_user: 'auser', body: 'hi someuser!'}] + old_messages where username = 'someuser'
APPLY BATCH;

 SELECT new_messages, old_messages FROM users_by_email WHERE email='someemail@abc.com';

 new_messages | old_messages
--------------+----------------------------------------------
         null | [{from_user: 'auser', body: 'hi someuser!'}]

So hopefully that gives you some ideas you can use. Have a look at the documentation on collections (i.e. lists, maps, and sets), since those can really help you to keep more information in one table and are sort of like tables within a table.

这篇关于Cassandra数据模型简单的消息应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆