用于简单消息应用程序的 Cassandra 数据模型 [英] Cassandra data model for simple messaging app

查看:19
本文介绍了用于简单消息应用程序的 Cassandra 数据模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习 Cassandra,并且总是发现最好的方法是从创建一个非常简单和小型的应用程序开始.因此,我正在创建一个基本的消息传递应用程序,它将使用 Cassandra 作为后端.我想执行以下操作:

I am trying to learn Cassandra and always find the best way is to start with creating a very simple and small application. Hence I am creating a basic messaging application which will use Cassandra as the back-end. I would like to do the following:

  • 用户将使用用户名、电子邮件和密码创建一个帐户.这电子邮件和密码可以随时更改.
  • 用户可以添加另一个用户作为他们的联系人.用户将添加一个通过搜索他们的用户名或电子邮件联系.联系人不需要如果我添加一个用户,他们是我的联系人,那么相互理解,我不需要等待他们接受/批准 Facebook 中的任何内容.
  • 一条消息从一个用户发送给另一个用户.发件人需要能够看到他们发送的消息(按时间排序)和发送给他们的消息(按时间排序).当用户打开我需要检查数据库中是否有任何新消息的应用程序用户.我还需要标记消息是否已被阅读.

当我来自关系数据库的世界时,我的关系数据库看起来像这样:

As I come from the world of relational databases my relational database would look something like this:

UsersTable
    username (text)
    email (text)
    password (text)
    time_created (timestamp)
    last_loggedIn (timestamp)
------------------------------------------------ 
ContactsTable
    user_i_added (text)
    user_added_me (text)
------------------------------------------------     
MessagesTable
    from_user (text)
    to_user (text)
    msg_body (text)
    metadata (text)
    has_been_read (boolean)
    message_sent_time (timestamp)

通读了几本 Cassandra 教科书,我想到了如何对数据库建模.我主要关心的是以非常有效的方式对数据库进行建模.因此,我试图避免诸如二级索引之类的事情.这是我目前的模型:

Reading through a couple of Cassandra textbooks I have a thought of how to model the database. My main concern is to model the database in a very efficient manner. Hence I am trying to avoid things such as secondary indexes etc. This is my model so far:

CREATE TABLE users_by_username (
    username text PRIMARY KEY,
    email text,
    password text
    timeCreated timestamp
    last_loggedin timestamp
)

CREATE TABLE users_by_email (
    email text PRIMARY KEY,
    username text,
    password text
    timeCreated timestamp
    last_loggedin timestamp
)

为了均匀分布数据并读取最少的分区(希望只有一个),我可以根据用户名或电子邮件快速查找用户.这样做的缺点显然是我将数据加倍,但存储成本非常便宜,所以我发现它是一个很好的折衷方案,而不是使用二级索引.最后登录也需要写入两次,但 Cassandra 的写入效率很高,所以我相信这也是一个很好的权衡.

To spread data evenly and to read a minimal amount of partitions (hopefully just one) I can lookup a user based on their username or email quickly. The downside of this is obviously I am doubling my data, but the cost of storage is quite cheap so I find it to be a good trade off instead of using secondary indexes. Last logged in will also need to be written in twice but Cassandra is efficent at writes so I believe this is a good tradeoff as well.

对于联系人,我想不出任何其他方式来对此进行建模,因此我对其进行了建模,这与我在关系数据库中的建模方式非常相似.我认为这是一个非常非规范化的设计,根据我读过的书应该对性能有好处?

For the contacts I can't think of any other way to model this so I modelled it very similar to how I would in a relational database. This is quite a denormalized design I beleive which should be good for performance according to the books I have read?

CREATE TABLE "user_follows" (
  follower_username text,
  followed_username text,
  timeCreated timestamp, 
  PRIMARY KEY ("follower_username", "followed_username")
);

CREATE TABLE "user_followedBy" (

  followed_username text,
  follower_username text,
  timeCreated timestamp,
  PRIMARY KEY ("followed_username", "follower_username")
);

我被困在如何创建下一部分.对于消息传递,我正在考虑这个表,因为它创建了宽行,可以对消息进行排序.我需要消息来回答两个问题.它首先需要能够向用户展示他们拥有的所有消息,并且还能够向用户展示新消息和未读消息.这是一个基本模型,但不确定如何使其更有效?

I am stuck on how to create this next part. For messaging I was thinking of this table as it created wide rows which enables ordering of the messages. I need messaging to answer two questions. It first needs to be able to show the user all the messages they have and also be able to show the user the messages which are new and are unread. This is a basic model, but am unsure how to make it more efficent?

CREATE TABLE messages (
    message_id uuid,
    from_user text,
    to_user text,
    body text,
    hasRead boolean,
    timeCreated timeuuid,
    PRIMARY KEY ((to_user), timeCreated )
) WITH CLUSTERING ORDER BY (timeCreated ASC);

我也在考虑使用诸如 STATIC 列之类的东西将用户和消息粘合"在一起,以及使用 SETS 来存储联系关系,但从我目前的狭隘理解来看,我提出的方式更有效.我问是否有任何想法可以提高这个模型的效率,是否有更好的实践来做我正在尝试做的事情,或者我在这个设计中是否有任何隐藏的问题?

I was also looking at using things such as STATIC columns to 'glue' together the user and messages, as well as SETS to store contact relationships, but from my narrow understanding so far the way I presented is more efficient. I ask if there are any ideas to improve this model's efficiency, if there are better practices do the things I am trying to do, or if there are any hidden problems I can face with this design?

总而言之,我正在尝试围绕查询建模.如果我使用的是关系数据库,这些基本上就是我想要回答的查询:

In conclusion, I am trying to model around the queries. If I were using relation databases these would be essentially the queries I am looking to answer:

To Login:
SELECT * FROM USERS WHERE (USERNAME = [MY_USERNAME] OR EMAIL = [MY_EMAIL]) AND PASSWORD = [MY_PASSWORD];
------------------------------------------------------------------------------------------------------------------------
Update user info:
UPDATE USERS (password) SET password = [NEW_PASSWORD] where username = [MY_USERNAME];
UPDATE USERS (email) SET password = [NEW_PASSWORD ] where username = [MY_USERNAME];
------------------------------------------------------------------------------------------------------------------------ 
To Add contact (If by username):
INSERT INTO followings(following,follower)  VALUES([USERNAME_I_WANT_TO_FOLLOW],[MY_USERNAME]);
------------------------------------------------------------------------------------------------------------------------
To Add contact (If by email):
SELECT username FROM users where email = [CONTACTS_EMAIL];
    Then application layer sends over another query with the username:
INSERT INTO followings(following,follower)  VALUES([USERNAME_I_WANT_TO_FOLLOW],[MY_USERNAME]);
------------------------------------------------------------------------------------------------------------------------
To View contacts:
SELECT following FROM USERS WHERE follower = [MY_USERNAME];
------------------------------------------------------------------------------------------------------------------------
To Send Message:,
INSERT INTO MESSAGES (MSG_ID, FROM, TO, MSG, IS_MSG_NEW) VALUES (uuid, [FROM_USERNAME], [TO_USERNAME], 'MY MSG', true);
------------------------------------------------------------------------------------------------------------------------
To View All Messages (Some pagination type of technique where shows me the 10 recent messages, yet shows which ones are unread):
SELECT * FROM MESSAGES WHERE TO = [MY_USERNAME] LIMIT 10;
------------------------------------------------------------------------------------------------------------------------
Once Message is read:
UPDATE MESSAGES SET IS_MSG_NEW = false WHERE TO = [MY_USERNAME] AND MSG_ID = [MSG_ID];

干杯

推荐答案

是的,当来自关系数据库背景时,总是很难适应 Cassandra 的局限性.由于我们还没有在 Cassandra 中进行连接的奢侈,您通常希望尽可能多地塞进一个表中.在您的情况下,这将是 users_by_username 表.

Yes it's always a struggle to adapt to the limitations of Cassandra when coming from a relational database background. Since we don't yet have the luxury of doing joins in Cassandra, you often want to cram as much as you can into a single table. In your case that would be the users_by_username table.

Cassandra 的一些功能应该可以让您做到这一点.

There are a few features of Cassandra that should allow you to do that.

由于您是 Cassandra 的新手,您可能可以使用 Cassandra 3.0,它目前处于测试版中.在 3.0 中有一个很好的特性叫做物化视图.这将允许您将 users_by_username 作为基表,并将 users_by_email 创建为物化视图.然后,每当您更新基表时,Cassandra 都会自动更新视图.

Since you are new to Cassandra, you could probably use Cassandra 3.0, which is currently in beta release. In 3.0 there is a nice feature called materialized views. This would allow you to have users_by_username as a base table, and create the users_by_email as a materialized view. Then Cassandra will update the view automatically whenever you update the base table.

另一个可以帮助您的功能是用户定义的类型(在 C* 2.1 及更高版本中).您可以将它们的结构创建为 UDT,而不是为关注者和消息创建单独的表,然后在用户表中保留这些类型的列表.

Another feature that will help you is user defined types (in C* 2.1 and later). Instead of creating separate tables for followers and messages, you can create the structure of those as UDT's, and then in the user table keep lists of those types.

因此,您的架构的简化视图可能是这样的(为了保持简单,我没有显示诸如时间戳之类的某些字段,但这些字段很容易添加).

So a simplified view of your schema could be like this (I'm not showing some of the fields like timestamps to keep this simple, but those are easy to add).

首先创建您的 UDT:

First create your UDT's:

CREATE TYPE user_follows (
    followed_username text,
    street text,
);

CREATE TYPE msg (
    from_user text,
    body text
);

接下来我们创建您的基表:

Next we create your base table:

CREATE TABLE users_by_username (
    username text PRIMARY KEY,
    email text,
    password text,
    follows list<frozen<user_follows>>,
    followed_by list<frozen<user_follows>>,
    new_messages list<frozen<msg>>,
    old_messages list<frozen<msg>>
);

现在我们创建一个由电子邮件分区的物化视图:

Now we create a materialized view partitioned by email:

CREATE MATERIALIZED VIEW users_by_email AS
    SELECT username, password, follows, new_messages, old_messages FROM users_by_username
    WHERE email IS NOT NULL AND password IS NOT NULL AND follows IS NOT NULL AND new_messages IS NOT NULL
    PRIMARY KEY (email, username);

现在让我们试一试,看看它能做什么.让我们创建一个用户:

Now let's take it for a spin and see what it can do. Let's create a user:

INSERT INTO users_by_username (username , email , password )
    VALUES ( 'someuser', 'someemail@abc.com', 'somepassword');

让用户关注另一个用户:

Let the user follow another user:

UPDATE users_by_username SET follows = [{followed_username: 'followme2', street: 'mystreet2'}] + follows
    WHERE username = 'someuser';

让我们向用户发送一条消息:

Let's send the user a message:

UPDATE users_by_username SET new_messages = [{from_user: 'auser', body: 'hi someuser!'}] + new_messages
    WHERE username = 'someuser';

现在让我们看看表中有什么:

Now let's see what's in the table:

SELECT * FROM users_by_username ;

 username | email             | followed_by | follows                                                 | new_messages                                 | old_messages | password
----------+-------------------+-------------+---------------------------------------------------------+----------------------------------------------+--------------+--------------
 someuser | someemail@abc.com |        null | [{followed_username: 'followme2', street: 'mystreet2'}] | [{from_user: 'auser', body: 'hi someuser!'}] |         null | somepassword

现在让我们检查我们的物化视图是否正常工作:

Now let's check that our materialized view is working:

SELECT new_messages, old_messages FROM users_by_email WHERE email='someemail@abc.com'; 

 new_messages                                 | old_messages
----------------------------------------------+--------------
 [{from_user: 'auser', body: 'hi someuser!'}] |         null

现在让我们阅读电子邮件并将其放入旧邮件中:

Now let's read the email and put it in the old messages:

BEGIN BATCH
    DELETE new_messages[0] FROM users_by_username WHERE username='someuser'
    UPDATE users_by_username SET old_messages = [{from_user: 'auser', body: 'hi someuser!'}] + old_messages where username = 'someuser'
APPLY BATCH;

 SELECT new_messages, old_messages FROM users_by_email WHERE email='someemail@abc.com';

 new_messages | old_messages
--------------+----------------------------------------------
         null | [{from_user: 'auser', body: 'hi someuser!'}]

希望这能给你一些可以使用的想法.查看有关集合(即列表、映射和集合)的文档,因为它们确实可以帮助您在一个表中保存更多信息,并且有点像表中的表.

So hopefully that gives you some ideas you can use. Have a look at the documentation on collections (i.e. lists, maps, and sets), since those can really help you to keep more information in one table and are sort of like tables within a table.

这篇关于用于简单消息应用程序的 Cassandra 数据模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆