加速大表和小表之间的内连接 [英] Speeding up inner joins between a large table and a small table

查看:169
本文介绍了加速大表和小表之间的内连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能是一个愚蠢的问题,但它可能会阐明联接在内部是如何工作的.

This may be a silly question, but it may shed some light on how joins work internally.

假设我有一个大表 L 和一个小表 S(100K 行 vs. 100 行).

Let's say I have a large table L and a small table S (100K rows vs. 100 rows).

以下两个选项在速度方面会有什么不同吗?:

Would there be any difference in terms of speed between the following two options?:

OPTION 1:                 OPTION 2:
---------                 ---------
SELECT *                  SELECT *
FROM L INNER JOIN S       FROM S INNER JOIN L
ON L.id = S.id;           ON L.id = S.id;

请注意,唯一的区别是表的连接顺序.

Notice that the only difference is the order in which the tables are joined.

我意识到不同 SQL 语言的性能可能会有所不同.如果是这样,MySQL 与 Access 相比如何?

I realize performance may vary between different SQL languages. If so, how would MySQL compare to Access?

推荐答案

不,顺序无关紧要.

几乎所有 RDBMS(例如 MS Access、MySQL、SQL Server、ORACLE 等)都使用基于列统计信息的基于成本的优化器.在大多数情况下,优化器会选择正确的计划.在您提供的示例中,顺序无关紧要(前提是统计数据是最新的).

Almost all RDBMS's (such MS Access, MySQL, SQL Server, ORACLE etc) use a cost based optimiser based upon column statistics. In most situations, the optimiser will choose a correct plan. In the example you gave, the order will not matter (provided statistics are up to date).

要决定使用什么查询策略,Jet Engine 优化器使用统计数据.以下因素是这些因素中的一些统计数据基于:

To decide what query strategy to use, the Jet Engine optimizer uses statistics. The following factors are some of the factors that these statistics are based on:

  • 表中的记录数
  • 表中数据页的数量
  • 桌子的位置
  • 是否存在索引
  • 索引的唯一性

注意:您无法查看 Jet 数据库引擎优化方案,并且您无法指定如何优化一个询问.但是,您可以使用数据库记录器来确定索引是否存在以及如何索引是唯一的.

Note: You cannot view Jet database engine optimization schemes, and you cannot specify how to optimize a query. However, you can use the Database Documenter to determine whether indexes are present and how unique an index is.

根据这些统计数据,优化器然后选择最好的处理的内部查询策略使用特定查询.

Based on these statistics, the Optimizer then selects the best internal query strategy for dealing with a particular query.

统计数据会在每次更新时更新查询已编译.一个查询被标记用于在保存任何内容时进行编译对查询(或其基础表)以及当数据库被压缩.如果查询是标记为编译,编译并且发生统计更新下次运行查询时.编译通常需要从一个秒到四秒.

The statistics are updated whenever a query is compiled. A query is flagged for compiling when you save any changes to the query (or its underlying tables) and when the database is compacted. If a query is flagged for compiling, the compiling and the updating of statistics occurs the next time that the query is run. Compiling typically takes from one second to four seconds.

如果您添加大量记录到您的数据库,您必须打开然后将您的查询保存到重新编译查询.例如,如果你设计然后测试一个查询使用一小组样本数据,您之后必须重新编译查询额外的记录被添加到数据库.当你这样做时,你想要确保最佳查询当你的表现达到应用程序正在使用中.

If you add a significant number of records to your database, you must open and then save your queries to recompile the queries. For example, if you design and then test a query by using a small set of sample data, you must re-compile the query after additional records are added to the database. When you do this, you want to make sure that optimal query performance is achieved when your application is in use.

参考.

可能感兴趣:ACC:如何优化 Microsoft Access 2.0、Microsoft Access 95 和微软访问 97

Tony Toews 的 Microsoft Access 性能常见问题解答值得一读.

Tony Toews's Microsoft Access Performance FAQ is worth reading.

有一个警告JOIN 顺序无关紧要".

如果您的 RDBMS 基于成本的查询优化器在创建查询计划时超时,那么连接顺序可能很重要.基于成本的优化器具有用于构建查询计划的有限资源(CPU 时间和内存).如果它们在编译阶段超时,您将获得迄今为止找到的最佳计划.

If your RDBMS's cost based query optimiser times out creating the query plan then the join order COULD matter. Cost based optimisers have finite resources (both CPU time and memory) in which to construct a query plan. If they time out during the compilation stage, you will get the best plan found so far.

TLDR;如果您有收到计划编译超时(而不是查询执行超时)的复杂查询,则将最严格的连接放在首位.这样,在查询计划优化器超时时,它会增加找到更好"计划的机会.

TLDR; If you have complex queries that receive a plan compilation timeout (not query execution timeout), then put your most restrictive joins first. That way, at the point the query plan optimiser times out, it will increase the chance that a 'better' plan was found.

当然,如果您遇到查询计划编译超时,您可能应该简化您的查询.

Of course, if you are experiencing query plan compilation timeouts, you should probably simplify your query.

这篇关于加速大表和小表之间的内连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆