SQL连接到相关的子查询,其中表通过重叠范围相关 [英] SQL join to correlated subquery where tables are related by overlapping ranges

查看:92
本文介绍了SQL连接到相关的子查询,其中表通过重叠范围相关的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我具有以下表格结构:

项目

ID | Name
--------
1  | Apple
2  | Pear 
3  | Banana
4  | Plum
5  | Tomato

事件

ItemStart | ItemEnd | EventType | EventDate
--------------------------------------------
     1    |    2    |  Planted  | 2014-01-01
     1    |    3    |  Picked   | 2014-01-02
     3    |    5    |  Eaten    | 2014-01-05

这两个表仅通过Item的主键以及Event中ItemStart和ItemEnd(包括)的范围进行链接.事件始终指的是连续的项目序列,但并非给定项目的所有事件都具有相同的范围.对于给定项目,事件永远不会在同一日期发生.

The two tables are linked only by the primary key of Item and the range of ItemStart and ItemEnd (inclusive) in Event. Events always refer to contiguous sequences of Items, but not all the Events for a given Item will have the same range. Events never occur on the same date for a given Item.

我想产生的查询如下:

List all the Items, and for each Item show the most recent Event

示例输出:

ID | Name   |   Event | Date
----------------------------
1  | Apple  |  Picked | 2014-01-02 (Planted then Picked)
2  | Pear   |  Picked | 2014-01-02 (Planted then Picked)
3  | Banana |  Eaten  | 2014-01-05 (Picked then Eaten)
4  | Plum   |  Eaten  | 2014-01-05 (Eaten)
5  | Tomato |  Eaten  | 2014-01-05 (Eaten)

从表面上看,这似乎足够合理,并且如果存在传统的外键关系(想象为ItemID而不是ItemStartItemEnd),我可能会加入相关子查询的联接像这样的东西:

This seems reasonable enough on the face of it, and if there were traditional foreign-key relationships in place (imagine ItemID instead of ItemStart and ItemEnd) I'd probably do a join to a correlated subquery something like this:

SELECT Name, EventType, EventDate
FROM Item i
    INNER JOIN (
        SELECT ItemID, EventType, EventDate
        FROM Event e
        WHERE EventDate = (SELECT MAX(EventDate) FROM Event e_max WHERE e_max.ItemID = e.ItemID)
    ) latest_events ON i.ID = latest_events.ItemID

但是,在保持范围关系的情况下,我想做更多类似的事情,但它不起作用:

However, with the range relationship in place I'm stuck, I want to do something more like this but it doesn't work:

SELECT Name, EventType, EventDate
FROM Item i
    INNER JOIN (
        SELECT ItemStart, ItemEnd, EventType, EventDate
        FROM Event e
        WHERE EventDate = (SELECT MAX(EventDate) FROM Event e_max WHERE i.ID >= e_max.ItemStart AND i.ID <= e_max.ItemEnd)
    ) latest_events ON i.ID >= latest_events.ItemStart AND i.ID <= latest_events.ItemEnd

第6行出现关于i.ID >= e_max.ItemStart AND i.ID <= e_max.ItemEnd的错误,因为您不能从联接的另一部分中引用i.我想这样做(在较简单的示例中是不必要的),因为在构造子查询时,我不再有要链接的单个ID-重叠范围意味着有很多可能的方式可以包含单个Item ,因此我想直接返回该项目,其ID仅在顶级项目"表中可用.

I get an error about i.ID >= e_max.ItemStart AND i.ID <= e_max.ItemEnd on line 6, because you can't reference i from within another part of the join. I wanted to do that (which isn't necessary in the simpler example) because when I'm constructing the subquery I no longer have a single ID to link to - the overlapping ranges mean that there are many possible ways of including a single Item, and so I want to refer directly back to that item, whose ID is only available in the top-level Item table.

我希望这是有道理的.

我正在使用SQL Server 2008 R2.这是一份将在一夜之间生成的报告,因此速度并不是那么重要,但是有很多项目(百万分之几).虽然每个项目都有多个事件,但使用大范围意味着事件记录要少得多.

I'm using SQL Server 2008 R2. This is for a report which will run overnight, so speed is not as important is it could be, but there are very many Items (100s of million); while there are multiple Events against each Item, the use of large ranges means there are much much fewer Event records.

我考虑过的事情:

  • 以某种方式扩展项目/事件关系,以便针对所有单个项目记录事件.这将大大增加正在考虑的数据量,但将允许使用更简单的查询方法.
  • 以某种方式处理事件以约束或合并范围-如果我知道对于给定项目,所有事件的起点和终点都相同,那么我也许可以简化一下事情.没有完全考虑到这一点.

如何产生此查询?预先感谢!

How can I produce this query? Thanks in advance!

推荐答案

您可以使用CTErow_number()来做到这一点.

You can do this utilizing a CTE and row_number().

SQL小提琴演示

SQL Fiddle Demo

;with cte as 
(
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY i.id ORDER BY e.EventDate DESC) as rNum
    FROM Item i
    JOIN Event e
        ON i.id between e.ItemStart and e.ItemEnd
)

SELECT ID,
  Name, 
  EventType,
  EventDate FROM cte
WHERE rNum = 1

基本上,CTE已加入item和event,并为rownumber添加了新列,并在item.ID上进行了分区.这是它的外观的屏幕截图.从这里,我只选择rNum = 1,它应该是每个item.id的最大活动日期.

Basically the CTE has joined item and event and added a new column for rownumber and is partitioned on item.ID. Here's a screenshot of what it looks like. From here I just select rNum = 1 which should be the max event date for each item.id.

这篇关于SQL连接到相关的子查询,其中表通过重叠范围相关的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆