非规范化:多少是多少? [英] Denormalization: How much is too much?

查看:136
本文介绍了非规范化:多少是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为网络应用设计了数据库,我正在建立按书。也就是说,我有:

I've designed the database for the web-app i'm building "by the book". That is, I've:


  • 创建包含应用程序实体,属性和关系的ER图表

  • 将ER图转换为模式

  • 将模式转换为无模式形式以使用数据库建模(数据库是Cassandra(NoSQL)数据库)

一切顺利(到目前为止)。我已经反正规化以前的伟大的结果,并且实施实现应用程序的一部分,将使用尚未反规范的数据。这样做对于这个特定的部分,我会预测,提高性能一定程度上(从1 Column_Family(关系世界中的表)而不是7)。

Everything is going well (so far). I've denormalized before with great results, and am curently implementing a part of the app which will use data that hasn't been denormalized yet. Doing so for this particular part will, I predict, increase performance somewhat substantially (reading from 1 Column_Family ("table" in the relational world) instead of 7).

,我害怕我可能会反标准化太多。如果我对这个问题的部分,这将几乎减少了我的应用程序中的Column_Family /表计数大约20%,并让我的数据库非正规化的大部分使我紧张的某种原因。

However, I fear that I may be denormalizing too much. If I were to to do so for the part in question, it would pretty much reduce the Column_Family/table count in my app by about 20%, and having that much of my database denormalized makes me nervous for some reason.

如果应用程序最终能够成功,我能够获得一个数据库设计师或管理员,我想他能够确定非规范化对于我寻求的表现(最佳情况)或至少没有害处(最坏情况)是必要的。

Should the app end up being enough of a success that I'm able to get a database designer or administrator on board, I'd like for him to be able determine that the denormalization I'm performing is necessary for the performance i'm seeking (best-case) or at the very least not harmful (worst-case).

有什么具体的事情应该注意什么时候进行反规范化决策,可能指示这样做是否会坏,或总是降到速度与可维护性?

Are there specific things I should look out for when making denormalization decisions that may indicate whether doing so would be bad, or does it always come down to speed vs. maintainability?

推荐答案

设计cassandra的模式与为sql数据库设计模式非常不同。使用sql数据库,数据适合一台机器,数据库将为您维护索引,可以执行连接,并且可以使用sql进行复杂的查询。这些都使标准化数据实用。

Designing a schema for cassandra is very different than designing a schema for a sql database. With a sql database your data fits on one machine, the database will maintain indexes for you, you can perform joins, and you can do complex queries with sql. These all make normalizing data practical.

在cassandra,你的数据不适合在一个机器上,所以你不能执行连接,唯一的查询,你可以做到有效率是一个键上的列的范围,cassandra只会为您保留有限的索引。

In cassandra you data does not fit on one machine so you can't perform joins, the only query you can do efficiently is get a range of columns on a key, and cassandra will only maintain limited indexes for you. This makes normalizing your data impractical.

在cassandra中,您通常会设计架构以提供您要执行的查询,而您可以通过非规范化来执行此操作。我最喜欢的例子是twitter对于他们的雨量统计数据,如帖子中所述,

In cassandra, you typically design your schema to serve the queries that you are going to make, and you denormalize to do that. My favorite example of this is what twitter does for their stats for rainbird as explained in this post,

For example, say someone clicks on a t.co link to blog.example.com/foo at 11:41am on 1st Feb. 
Rainbird would increment counters for:

 t.co click: com (all time)
 t.co click: com.example (all time)
 t.co click: com.example.blog (all time)
 t.co click: com.example.blog /foo (all time)
 t.co click: com (1st Feb 2011)
 t.co click: com.example (1st Feb 2011)
 t.co click: com.example.blog (1st Feb 2011)
 t.co click: com.example.blog /foo (1st Feb 2011)
 t.co click: com (11am-12 on 1st Feb)
 t.co click: com.example (11am-12 on 1st Feb)
 t.co click: com.example.blog (11am-12 on 1st Feb)
 t.co click: com.example.blog /foo (11am-12 on 1st Feb)
 t.co click: com (11:41-42 on 1st Feb)
 t.co click: com.example (11:41-42 on 1st Feb)
 t.co click: com.example.blog (11:41-42 on 1st Feb)
 t.co click: com.example.blog /foo (11:41-42 on 1st Feb)

这1次点击被复制16次,以满足可以完成的16个查询。

This 1 click is copied 16 times to satisfy the 16 queries that can be done.

这是一个很好的演示如何在cassandra中建立索引

这篇关于非规范化:多少是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆