Cassandra 非规范化数据模型 [英] Cassandra denormalization datamodel

查看:17
本文介绍了Cassandra 非规范化数据模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读到在 nosql(例如 cassandra)中,数据通常是非规范化存储的.例如,请参阅此 SO 答案或此 网站.

I read that in nosql (cassandra for instance) data is often stored denormalized. For instance see this SO answer or this website.

一个例子是,如果你有一个员工和部门的列族,并且你想执行一个查询:select * from Emps where Birthdate = '25/04/1975'然后你必须制作一个列族birthday_Emps,并将每个员工的ID存储为一列.因此,您可以查询关键字25/04/1975"的birthday_Emps 家族,并立即获得该日期出生的员工的所有ID.您甚至还可以将员工详细信息非规范化为birthday_Emps,这样您也可以立即获得员工姓名.

An example is if you have a column family of employees and departments and you want to execute a query: select * from Emps where Birthdate = '25/04/1975' Then you have to make a column family birthday_Emps and store the ID of each employee as a column. So then you can query the birthday_Emps family for the key '25/04/1975' and instantly get all the ID's of the employees born on that date. You can even denormalize the employee details into birthday_Emps as well so that you also instantly have the employee names.

这真的是方法吗?

  1. 每当删除或插入员工时,您也必须从birthday_Emps 中删除该员工.在另一个示例中,有人甚至说有时您会遇到某些表中的一次删除需要其他表中的 100 次删除的情况.这真的很常见吗?

  1. Whenever an employee is deleted or inserted then you will have to remove the employee from birthday_Emps too. And in another example someone even said that sometimes you have a situation where one delete in some table requires like 100's of deletes in other tables. Is this really common to do?

在应用程序代码中加入连接是否常见?您是否拥有允许您创建预先编写的应用程序以将来自不同查询的数据连接在一起的软件?

Is it common to do joins in application code? Do you have software that allows you create pre-written applications to join together data from different queries?

是否有处理这些数据模型问题的最佳实践、模式等?

Are there best practices, patterns, etc for handling these data model questions?

推荐答案

是"在大多数情况下,采用基于查询的数据建模方法确实是最好的方法.

"Yes" for the most part, taking an approach of query-based data modeling really is the best way to do it.

  1. 这仍然是一个好主意,因为查询时间的速度使它值得.是的,还有一点清洁工作要做.我不必从其他列族中执行 100 次删除操作,但偶尔会有一些复杂的清理工作要做.但是,无论如何,您都不应该在 Cassandra 中进行大量删除操作(反模式).

  1. That is still a good idea to do, because the speed of your query times make it worth it. Yes, there's a little more housecleaning to do. I haven't had to execute 100s of deletes from other column families, but occasionally there is some complicated clean-up to do. But, you shouldn't be doing a whole lot of deleting in Cassandra anyway (anti-pattern).

没有.客户端 JOIN 与分布式 JOIN 一样糟糕.整个想法是创建一个表来为每个特定查询返回数据......非规范化和/或复制......从而完全不需要执行 JOIN .对此的例外是,如果您正在运行 OLAP 查询进行分析,则可以使用 Apache Spark 之类的工具来执行临时的分布式 JOIN.但这绝对不是您想在生产系统上做的事情.

No. Client-side JOINs are just as bad as distributed JOINs. The whole idea is to create a table to return data for each specific query...denormalized and/or replicated...and thus negating the need to do a JOIN at all. The exception to this, is if you are running OLAP queries for analysis, you can use a tool like Apache Spark to execute an ad-hoc, distributed JOIN. But it's definitely not something you'd want to do on a production system.

我可以推荐的几篇文章:

A few articles I can recommend:

  • Getting Started with Cassandra Time Series Data Modeling - Written by DataStax's Chief Evangelist Patrick McFadin, it covers one of the more common Cassandra use cases in a few different ways.
  • Escaping From Disco-Era Data Modeling - This one talks about some of the obstacles that beginners with Cassandra can face, as well as the general approach to take in overcoming them. Disclaimer: I am the author.
  • Cassandra Data Modeling Best Practices, Part 1 - You can't go wrong with Jay Patel's (eBay) classic article on Cassandra modeling practices. It's a little dated in that the examples are grounded in the pre-CQL world, but the techniques still resonate.

这篇关于Cassandra 非规范化数据模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆