比使用“A UNION (B in A)"更高效的 SQL? [英] More efficient SQL than using "A UNION (B in A)"?

查看:23
本文介绍了比使用“A UNION (B in A)"更高效的 SQL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑 1(澄清):感谢您到目前为止的回答!反应令人欣慰.
我想稍微澄清一下这个问题,因为根据这些答案,我认为我没有正确描述问题的一个方面(我确信这是我的错,因为我什至很难为自己定义).
问题在于:结果集应仅包含 tstamp BETWEEN '2010-01-03' AND '2010-01-09' 的记录,以及 one 记录,其中每个 order_num 的 tstamp 为 NULL在第一组中(对于每个 order_num,总是 有一个带有空 tstamp 的).
到目前为止给出的答案似乎包括某个 order_num 的 所有 记录,如果 any 带有 tstamp BETWEEN '2010-01-03' AND '2010-01-09'.例如,如果还有另一条 order_num = 2 且 tstamp = 2010-01-12 00:00:00 的记录,则它应该包含在结果中.

原始问题:
考虑一个包含 id(唯一)、order_num、tstamp(时间戳)和 item_id(包含在订单中的单个项目)的订单表.tstamp 为空,除非订单已被修改,在这种情况下,还有另一个具有相同 order_num 和 tstamp 的记录,然后包含更改发生时的时间戳.

示例...

<前>id order_num tstamp item_id__ _________ _________ ___0 1 1001 2 1012 2 2010-01-05 12:34:56 1023 3 1134 4 1245 5 1356 5 2010-01-07 01:23:45 1367 5 2010-01-07 02:46:00 1378 6 1009 6 2010-01-13 08:33:55 105

检索在特定日期范围内已修改一次或多次的所有订单(基于 order_num)的最有效 SQL 语句是什么?换句话说,对于每个订单,我们需要所有具有相同 order_num 的记录(包括带有 NULL tstamp 的记录),对于每个 order_num,其中至少有一个 order_num 具有 tstamp NOT NULL AND tstamp BETWEEN '2010-01-03'和2010-01-09".这是我遇到困难的至少有一个 order_num 的 tstamp NOT NULL".

结果集应如下所示:

<前>id order_num tstamp item_id__ _________ _________ ___1 2 1012 2 2010-01-05 12:34:56 1025 5 1356 5 2010-01-07 01:23:45 1367 5 2010-01-07 02:46:00 137

我想出的 SQL 是这样的,本质上是A UNION (B in A)",但执行速度很慢,希望有更有效的解决方案:

<前>选择 history_orders.order_id、history_orders.tstamp、history_orders.item_id从(SELECT orders.order_id, orders.tstamp, orders.item_idFROM 订单WHERE orders.tstamp BETWEEN '2010-01-03' 和 '2010-01-09')AS history_orders联盟选择 current_orders.order_id, current_orders.tstamp, current_orders.item_id从(SELECT orders.order_id, orders.tstamp, orders.item_idFROM 订单WHERE orders.tstamp 为空)AS current_ordersWHERE current_orders.order_id IN(选择订单.order_idFROM 订单WHERE orders.tstamp BETWEEN '2010-01-03' 和 '2010-01-09');

解决方案

再次感谢您的所有建议.我找到了三个有效的解决方案,包括我的原始解决方案.最后,我添加了一些性能结果,但并没有我希望的那么好.如果有人可以改进这一点,我会很高兴!

1) 目前找到的最佳解决方案似乎是:

<前>选择 history_orders.order_id、history_orders.tstamp、history_orders.item_id从(SELECT orders.order_id, orders.tstamp, orders.item_idFROM 订单WHERE orders.tstamp '2010-01-03' 和 '2010-01-09'或 orders.tstamp 为 NULL)AS history_ordersWHERE history_orders.order_id IN(选择订单.order_idFROM 订单WHERE orders.tstamp BETWEEN '2010-01-03' 和 '2010-01-09');

2) 我还尝试使用 EXISTS 代替 IN,这需要在最后一个 SELECT 中添加一个 WHERE 子句:

<前>选择 history_orders.order_id、history_orders.tstamp、history_orders.item_id从(SELECT orders.order_id, orders.tstamp, orders.item_idFROM 订单WHERE orders.tstamp '2010-01-03' 和 '2010-01-09'或 orders.tstamp 为 NULL)AS history_orders存在的地方(选择订单.order_idFROM 订单WHERE history_orders.order_id = orders.order_idAND orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

3) 最后是我的原始解决方案,使用 UNION.

评论:
为了评论表大小,我实际的现实世界"问题涉及 4 个表(与内连接连接),分别包含 98、2189、43897、785656 条记录.

性能 - 我对每个解决方案运行了 3 次,以下是我的真实世界结果:
1: 52, 51, 51 秒
2: 54, 54, 53 秒
3: 56, 56, 56 秒

Edit 1 (clarification): Thank you for the answers so far! The response is gratifying.
I want to clarify the question a little because based on the answers I think I did not describe one aspect of the problem correctly (and I'm sure that's my fault as I was having a difficult time defining it even for myself).
Here's the rub: The result set should contain ONLY the records with tstamp BETWEEN '2010-01-03' AND '2010-01-09', AND the one record where the tstamp IS NULL for each order_num in the first set (there will always be one with null tstamp for each order_num).
The answers given so far appear to include all records for a certain order_num if there are any with tstamp BETWEEN '2010-01-03' AND '2010-01-09'. For example, if there were another record with order_num = 2 and tstamp = 2010-01-12 00:00:00 it should not be included in the result.

Original question:
Consider an orders table containing id (unique), order_num, tstamp (a timestamp), and item_id (the single item included in an order). tstamp is null, unless the order has been modified, in which case there is another record with identical order_num and tstamp then contains the timestamp of when the change occurred.

Example...

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 0          1                           100
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 3          3                           113
 4          4                           124
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137
 8          6                           100
 9          6  2010-01-13 08:33:55      105

What is the most efficient SQL statement to retrieve all of the orders (based on order_num) which have been modified one or more times during a certain date range? In other words, for each order we need all of the records with the same order_num (including the one with NULL tstamp), for each order_num WHERE at least one of the order_num's has tstamp NOT NULL AND tstamp BETWEEN '2010-01-03' AND '2010-01-09'. It's the "WHERE at least one of the order_num's has tstamp NOT NULL" that I'm having difficulty with.

The result set should look like this:

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137

The SQL that I came up with is this, which is essentially "A UNION (B in A)", but it executes slowly and I hope there is a more efficient solution:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09')
    AS history_orders
UNION
SELECT current_orders.order_id, current_orders.tstamp, current_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp IS NULL)
    AS current_orders
WHERE current_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

解决方案

Thank you again for all the suggestions. I found three solutions that work, including my original. At the end I've added some performance results, which are not as great as I had hoped. If anyone can improve on this I would be thrilled!

1) The best solution found so far seems to be:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE history_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

2) I also tried using EXISTS in place of IN, which requires an additional WHERE clause in the last SELECT:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE EXISTS
   (SELECT orders.order_id
    FROM orders
    WHERE history_orders.order_id = orders.order_id
    AND orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

3) And finally there is my original solution, using UNION.

Comments:
To comment on the table size, my actual "real world" problem involves 4 tables (connected with inner joins) containing 98, 2189, 43897, 785656 records respectively.

Performance - I ran each solution three times and here are my real world results:
1: 52, 51, 51 seconds
2: 54, 54, 53 s
3: 56, 56, 56 s

这篇关于比使用“A UNION (B in A)"更高效的 SQL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆