哪一个提供更好的性能一个大连接或多个查询? [英] Which provides better performance one big join or multiple queries?

查看:566
本文介绍了哪一个提供更好的性能一个大连接或多个查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为订单的表。订单上的一列是customer_id

i有一个名为customers,有10个字段的表格



给定两个选项,如果我想建立一个订单数组对象和嵌入订单对象是客户对象我有两个选择。



选项1:



a 。第一个查询订单表。
b。循环记录并查询persons表以获取该人的记录



这将是:

 从应用程序中选择* b从

中选择*从id = 1的客户中选择*
从客户选择*,其中id = 2
从客户中选择* where id = 3
选择* from Customer where id = etc。 。 。



选项2:



在所有字段上进行连接



它是一个明显的#2,因为你只做一个查询对1 + [numberOforders]查询(可能有几百个或更多) >

这将是:

 从应用程序a 
Innerjoin c.id = a.customerID

我的主要问题是, 10个其他表,在订单表(类似于客户),你有在订单表中的ID。如果你做一个单一的查询,连接这10个表或在某一点是低效的做到这一点:



任何建议将有助于..有任何优化,以确保快速性能

解决方案

我同意所有人的说法,单一的连接可能会更有效率,即使有很多表。它也比在应用程序代码中的工作少了开发工作。这假设表适当地索引,每个外键列上有一个索引,并且(当然)每个主键列上有一个索引。



最好的办法是尝试最简单的方法(大连接)首先,并看看它的表现。如果它表现良好,那么伟大 - 你做完了。如果性能不佳,则对查询进行配置,并查找表上缺少的索引。



您的选项#1不太可能表现良好,因为网络往返次数(如提到的anijhaw)。这有时被称为选择N + 1问题 - 你做一个SELECT来获得N个应用程序的列表,然后在循环中执行N SELECT以获得客户。这种记录在一个时间循环对于应用程序员是自然的;但是当你一次操作整套数据时,SQL工作得更好。



如果选项#2慢,即使具有良好的索引,你可能想要查看缓存。您可以在应用程序(如果有足够的RAM)或专用缓存服务器(如memcached)中的数据库缓存(使用汇总表或实体化/索引视图)。当然,这取决于您的查询结果需要如何更新。如果一切都必须是最新的,那么任何缓存都必须在基础表更新的时候更新 - 它变得复杂,变得不再有用了。



这听起来像一个报告查询,而报告通常不需要是实时的。所以缓存可能能够帮助你。



根据你的DBMS,另一个需要考虑的是这个查询对其他查询命中同一个数据库的影响。如果您的DBMS允许读者阻止写入程序,则此查询可能会阻止更新表,如果它需要很长时间运行。这将是坏的。 Oracle没有此问题,并且SQL Server在读提交快照模式下运行时也不会出现此问题。我不知道MySQL。


i have a table called orders. one column on order is customer_id
i have a table called customers with 10 fields

Given the two options if i want to build up an array of order objects and embedded in an order object is a customer object i have two choices.

Option 1:

a. first query orders table. b. loop through records and query the persons table to get the records for the person

this would be something like:

 Select * from APplications

 Select * from Customer where id = 1
 Select * from Customer where id = 2
 Select * from Customer where id = 3
 Select * from Customer where id = etc . . .

Option 2:

a. do a join on all fields

its an obvious #2 because you are only doing one query versus 1 + [numberOforders] queries (could be hundreds or more)

This would be something like:

 Select * from Applications a, Customers c
 Innerjoin c.id = a.customerID

my main question is, what if i had 10 other tables that were off of the orders table (similar to customer) where you had the id in the order table. should you do a single query that joins these 10 tables or at some point is it inefficient do to this:

any suggestions would help.. is there any optimization to ensure fast performance

解决方案

I agree with everyone who's said a single join will probably be more efficient, even with a lot of tables. It's also less development effort than doing the work in your application code. This assumes the tables are appropriately indexed, with an index on each foreign key column, and (of course) an index on each primary key column.

Your best bet is to try the easiest approach (the big join) first, and see how well it performs. If it performs well, then great - you're done. If it performs poorly, profile the query and look for missing indexes on your tables.

Your option #1 is not likely to perform well, due to the number of network round-trips (as anijhaw mentioned). This is sometimes called the "select N+1" problem - you do one SELECT to get the list of N applications, and then do N SELECTs in a loop to get the customers. This record-at-a-time looping is natural to application programmers; but SQL works much better when you operate on whole sets of data at once.

If option #2 is slow even with good indexing, you may want to look into caching. You can cache in the database (using a summary table or materialized/indexed view), in the application (if there is enough RAM), or in a dedicated caching server such as memcached. Of course, this depends on how up-to-date your query results need to be. If everything has to be fully up-to-date, then any cache would have to be updated whenever the underlying tables are updated - it gets complicated and becomes less useful.

This sounds like a reporting query though, and reporting often doesn't need to be real-time. So caching might be able to help you.

Depending on your DBMS, another thing to think about is the impact of this query on other queries hitting the same database. If your DBMS allows readers to block writers, then this query could prevent updates to the tables if it takes a long time to run. That would be bad. Oracle doesn't have this problem, and neither does SQL Server when run in "read committed snapshot" mode. I don't know about MySQL though.

这篇关于哪一个提供更好的性能一个大连接或多个查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆