在SQL中使用JOIN的成本是多少?和/或,性能与规范化之间有何取舍? [英] How costly are JOINs in SQL? And/or, what's the trade off between performance and normalization?

查看:130
本文介绍了在SQL中使用JOIN的成本是多少?和/或,性能与规范化之间有何取舍?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现了一个类似的线程,但是它并没有真正抓住我要询问的内容的精髓-因此,我创建了一个新线程.

我知道规范化和性能之间需要权衡取舍,我想知道画线的最佳实践是什么?在我的特定情况下,我有一个消息传递系统,该系统具有三个不同的表:messages_threads(总体消息持有者),messages_recipients(涉及到谁)和messages_messages(实际消息+时间戳记).

为了返回收件箱"视图,我必须将messages_threads表,users表和pictures表连接到messages_recipients表,以获取填充该视图的信息(配置文件图片,发件人名称,线程id)...,我仍然要添加一条消息联接,以便从最后一条消息中检索文本,以便向用户显示最后一条消息的预览".

我的问题是:SQL中的JOINS对性能有多昂贵?例如,我可以将发件人的姓名(必须离开用户才能连接到发件人的姓名)存储在messages_threads表中名为"sendername"的字段下-但是在规范化方面,我一直被教导要避免数据冗余?

您在哪里划界线?还是我高估了影响性能的SQL联接的程度?

解决方案

最佳实践是始终始终从3NF开始,然后在发现特定的性能问题时才考虑反规范.

性能只是您要处理的数据库问题中的一个.通过复制数据,您冒着使不一致的数据进入数据库的风险,从而使关系数据库的核心原则之一,即一致性(ACID中的C) a 无效. /p>

是的,加入需要付出一定的代价,这无可避免.但是,成本通常比您想象的要低得多,并且通常会因其他因素(如网络传输时间)而陷入困境.通过确保对相关列进行正确索引,您可以避免很多费用.

而且,请记住优化原则:测量,不要猜测!在类似于生产的环境中进行测量.而且,保持定期进行测量(和调整)-如果您的架构和数据永不更改(极不可能),那么优化只是一整套设置,而且会遗忘的操作.


a)通常,通过使用触发器来保持一致性,可以安全地恢复性能.当然,这会使您的更新速度变慢,但仍可能使您的选择运行得更快.

I've found a similar thread but it doesn't really capture the essence of what I'm trying to ask - so I've created a new thread.

I know there is a trade-off between normalization and performance, and I'm wondering what's the best practice for drawing that line? In my particular situation, I have a messaging system that has three distinct tables: messages_threads (overarching message holder), messages_recipients (who is involved), and messages_messages (the actual messages + timestamps).

In order to return the "inbox" view, I have to left join the messages_threads table, users table, and pictures tables to the messages_recipients tables in order to get the information to populate the view (profile picture, sender name, thread id)... and I've still got add a join to messages to retrieve the text from the last message in order to display a "preview" of the last message to the user.

My question is: How costly are JOINS in SQL to performance? I could, for instance, store the sender's name (which I have to left join from users to retrieve) under a field in the messages_threads table called "sendername" - but in terms of normalization I've always been taught to avoid data redundancy?

Where do you draw the line? Or am I overestimating how performance-hampering SQL joins are?

解决方案

The best practice is to always start with 3NF, and then only consider denormalistion if you find a specific performance problem.

Performance is just one of the issues you have to deal with with databases. By duplicating data, you run the risk of allowing inconsistent data to be in your database, thus nullifying one of the core principles of relational databases, consistency (the C in ACID) a.

Yes, joins have a cost, there's no getting around that. However, the cost is usually a lot less than you'd think, and can often be swamped by other factors like network transmission times. By making sure the relevant columns are indexed properly, you can avoid a lot of those costs.

And, remember the optimisation mantra: measure, don't guess! And measure in a production-like environment. And keep measuring (and tuning) periodically - optimisation is only a set and forget operation if your schema and data never change (very unlikely).


a) Reversion for performance can usually be made safe by using triggers to maintain consistency. This will, of course, slow down your updates, but may still let your selects run faster.

这篇关于在SQL中使用JOIN的成本是多少?和/或,性能与规范化之间有何取舍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆