优化在Oracle上运行缓慢,在SQL Server上快速运行的SELECT查询 [英] Optimising a SELECT query that runs slow on Oracle which runs quickly on SQL Server

查看:279
本文介绍了优化在Oracle上运行缓慢,在SQL Server上快速运行的SELECT查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Oracle中运行以下SQL语句,并且运行起来需要一定的时间:

I'm trying to run the following SQL statement in Oracle, and it takes ages to run:

SELECT orderID FROM tasks WHERE orderID NOT IN 
(SELECT DISTINCT orderID FROM tasks WHERE
 engineer1 IS NOT NULL AND engineer2 IS NOT NULL)

如果我只运行IN子句中的子部分,那么它在Oracle中的运行速度非常快,即

If I run just the sub-part that is in the IN clause, that runs very quickly in Oracle, i.e.

SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL

为什么在Oracle中整个语句要花这么长时间?在SQL Server中,整个语句运行很快.

Why does the whole statement take such a long time in Oracle? In SQL Server the whole statement runs quickly.

或者我应该使用更简单/不同/更好的SQL语句吗?

Alternatively is there a simpler/different/better SQL statement I should use?

有关该问题的更多详细信息:

Some more details about the problem:

  • 每个订单都由许多任务组成
  • 将分配每个订单(其一个或多个任务将设置engineer1和engineer2),或者可以取消分配订单(其所有任务的engineer字段都为空值)
  • 我正在尝试查找所有未分配的订单ID.

只要有任何区别,表中就有约12万行,每个订单有3个任务,因此约有40k个不同的订单.

Just in case it makes any difference, there are ~120k rows in the table, and 3 tasks per order, so ~40k different orders.

对答案的回应:

  • 我希望SQL语句在SQL Server和Oracle中都可以使用.
  • 任务仅在orderID和taskID上具有索引.
  • 我尝试了该语句的NOT EXISTS版本,但是该语句运行了3分钟以上,然后我才取消了它.也许需要该语句的JOIN版本?
  • 还有一个"orders"表以及orderID列.但是我试图通过不将其包含在原始SQL语句中来简化该问题.

我猜想在原始SQL语句中,子查询每次都针对SQL语句第一部分中的每一行运行-即使它是静态的,只需要运行一次?

I guess that in the original SQL statement the sub-query is run every time for each row in the first part of the SQL statement - even though it is static and should only need to be run once?

执行

ANALYZE TABLE tasks COMPUTE STATISTICS;

使我原来的SQL语句执行得更快.

made my original SQL statement execute much faster.

尽管我仍然很好奇为什么必须这样做,以及是否/何时需要再次运行它?

Although I'm still curious why I have to do this, and if/when I would need to run it again?

统计数据提供了Oracle的 基于成本的优化器信息 需要确定效率 不同的执行计划: 例如,表中的行数, 行的平均宽度,最高和 每列的最小值,数量 每列不同的值,聚类 指数等因素.

The statistics give Oracle's cost-based optimzer information that it needs to determine the efficiency of different execution plans: for example, the number of rowsin a table, the average width of rows, highest and lowest values per column, number of distinct values per column, clustering factor of indexes etc.

在小型数据库中,您可以进行设置 每天晚上收集统计数据的工作 别管它了.其实这是 默认为10克以下.对于更大的 您通常必须执行的实现 权衡执行的稳定性 针对数据的方式进行计划 变化,这是一个棘手的平衡.

In a small database you can just setup a job to gather statistics every night and leave it alone. In fact, this is the default under 10g. For larger implementations you usually have to weigh the stability of the execution plans against the way that the data changes, which is a tricky balance.

Oracle还具有一个称为 动态采样"用于 确定相关的样本表 执行时的统计信息.它是 更常与数据一起使用 仓库的间接费用 取样超过 潜在的性能提升 长期运行的查询.

Oracle also has a feature called "dynamic sampling" that is used to sample tables to determine relevant statistics at execution time. It's much more often used with data warehouses where the overhead of the sampling it outweighed by the potential performance increase for a long-running query.

推荐答案

如果您分析涉及的表,这种类型的问题通常就会消失(因此Oracle对数据的分布有更好的了解)

Often this type of problem goes away if you analyze the tables involved (so Oracle has a better idea of the distribution of the data)

ANALYZE TABLE tasks COMPUTE STATISTICS;

这篇关于优化在Oracle上运行缓慢,在SQL Server上快速运行的SELECT查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆