关联独立事件的顺序-计算时间相交 [英] Correlate Sequences of Independent Events - Calculate Time Intersection

查看:166
本文介绍了关联独立事件的顺序-计算时间相交的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在构建PowerBI报告解决方案,我(以及Stack)解决了 ,企业提出了新的报告思路。由于我对PowerBI知之甚少,因此尚不确定最好的方法。

We are building a PowerBI reporting solution and I (well Stack) solved one problem and the business came up with a new reporting idea. Not sure of the best way to approach it as I know very little about PowerBI and the business seems to want quite complex reports.

我们有来自不同数据的两个事件序列资料来源。它们都包含车辆发生的独立事件。一个描述车辆在什么位置,另一个描述具有事件原因代码的事件。该企业希望报告由于各种原因在每个位置花费的时间。车辆可以完全独立于发生的事件事件来更改位置-事件实际上是日期时间,并且在整个一天的随机时间发生。每种类型的事件都有一个Startime / Endtime和一个VehicleID。

We have two sequences of events from separate data sources. They both contain independent events occurring to vehicles. One describes what location a vehicle is within - the other describes incident events which have a reason code for the incident. The business wants to report on time spent in each location for each reason. Vehicles can change location totally independent of the incident events occurring - and events actually are datetime and occur at random points throughtout day. Each type of event has a startime/endtime and a vehicleID.

车辆位置事件

+------------------+-----------+------------+-----------------+----------------+
| LocationDetailID | VehicleID | LocationID |  StartDateTime  |  EndDateTime   |
+------------------+-----------+------------+-----------------+----------------+
|                1 |         1 |          1 |        2012-1-1 |      2016-1-1  |
|                2 |         1 |          2 |        2016-1-1 |      2016-4-1  |
|                3 |         1 |          1 |        2016-4-1 |      2016-11-1 |
|                4 |         2 |          1 |        2011-1-1 |      2016-11-1 |
+------------------+-----------+------------+-----------------+----------------+

车辆状态事件

+---------+---------------+-------------+-----------+--------------+
| EventID | StartDateTime | EndDateTime | VehicleID | ReasonCodeID |
+---------+---------------+-------------+-----------+--------------+
|       1 | 2012-1-1      | 2013-1-1    |         1 |            1 |
|       2 | 2013-1-1      | 2015-1-1    |         1 |            3 |
|       3 | 2015-1-1      | 2016-5-1    |         1 |            4 |
|       4 | 2016-5-1      | 2016-11-1   |         1 |            2 |
|       5 | 2015-9-1      | 2016-2-1    |         2 |            1 |
+---------+---------------+-------------+-----------+--------------+

无论如何,我可以将两个流关联在一起,并根据每个位置的ReasonCode计算每辆车的总时间吗?这似乎要求我能够关联这两个事件-因此,在给定的ReasonCode中可能会发生位置更改。

Is there anyway I can correlate the two streams together and calculate total time per Vehicle per ReasonCode per location? This would seem to require me to be able to relate the two events - so a change of location may occur part way through a given ReasonCode.

计算示例ReasonCodeID 4


  • VehicleID 1的位置ID为2012-1-1至2016-1-1和
    2016-4-1至2016-11-1

  • VehicleID 1位于位置ID 2中,从2016-1-1
    至2016-4-1

  • VehcileID 1具有从2015-1-1到
    2016-5-1的ReasonCodeID 4

因此,位置1中的第一个期间与365天的ReasonCodeID 4相交(2015-1-1至2016-1-1)。位置1中的第二个周期与30天(2016-4-1至2016-5-1)相交。
在位置2与91天的ReasonCodeID 4相交(2016-1-1至2016-4-1

Therefore first Period in location 1 intersects with 365 days of ReasonCodeID 4 (2015-1-1 to 2016-1-1). 2nd period in location 1 intersects with 30 days (2016-4-1 to 2016-5-1). In location 2 intersects with 91 days of ReasonCodeID 4(2016-1-1 to 2016-4-1

期望的输出如下)。 p>

Desired output would be the below.

+-----------+--------------+------------+------------+
| VehicleID | ReasonCodeID | LocationID | Total Days |
+-----------+--------------+------------+------------+
|         1 |            1 |          1 |        366 |
|         1 |            3 |          1 |        730 |
|         1 |            4 |          1 |        395 |
|         1 |            4 |          2 |         91 |
|         1 |            2 |          1 |        184 |
|         2 |            1 |          1 |        154 |
+-----------+--------------+------------+------------+

我创建了一个SQL提琴,该提琴显示了结构在这里

I have created a SQL fiddle that shows the structure here

车辆有相关表格,我敢肯定生意会nt按车辆类别等对它们进行分组,但是如果我能理解这种情况下的交点,那将为我提供其余报告的基础。

Vehicles have related tables and I'm sure the business will want them grouped by vehicle class etc but if I can understand how to calculate the intersection points in this case that would give me the basis for rest of reporting.

推荐答案

我认为此解决方案需要 CROSS JOIN 实现。两个表之间的关系是多对多,这意味着创建了第三个表,它们桥接了 LocationEvents VehicleStatusEvents 表,所以我认为在表达式中指定关系会更容易。

I think this solution requires a CROSS JOIN implementation. The relationship between both tables is Many to Many which implies the creation of a third table that bridges LocationEvents and VehicleStatusEvents tables so I think specifying the relationship in the expression could be easier.

我在两个表之间都使用了CROSS JOIN,然后仅过滤结果以获取两个表中VehicleID列相同的行。我还过滤了VehicleStatusEvents范围日期与LocationEvents范围日期相交的行。

I use a CROSS JOIN between both tables, then filter the results only to get those rows which VehicleID columns are the same in both tables. I am also filtering the rows that VehicleStatusEvents range dates intersects LocationEvents range dates.

一旦过滤完成,我将添加一列以计算每个交叉点之间的天数。最后,该度量汇总每个VehicleID,ReasonCodeID和LocationID的天数。

Once the filtering is done I am adding a column to calculate the count of days between each intersection. Finally, the measure sums up the days for each VehicleID, ReasonCodeID and LocationID.

要实施交叉加入,您必须重命名两个表中任何一个上的VehicleID StartDateTime EndDateTime 。为了避免歧义的列名错误,这是必要的。

In order to implement the CROSS JOIN you will have to rename the VehicleID, StartDateTime and EndDateTime on any of both tables. It is necessary for avoiding ambigous column names errors.

我按以下方式重命名列:

I rename the columns as follows:

VehicleID LocationVehicleID StatusVehicleID

StartDateTime LocationStartDateTime StatusStartDateTime

EndDateTime LocationEndDateTime StatusEndDateTime

VehicleID : LocationVehicleID and StatusVehicleID
StartDateTime : LocationStartDateTime and StatusStartDateTime
EndDateTime : LocationEndDateTime and StatusEndDateTime

之后,您可以在总天数度量中使用CROSSJOIN:

After this you can use CROSSJOIN in the Total Days measure:

Total Days =
SUMX (
    FILTER (
        ADDCOLUMNS (
            FILTER (
                CROSSJOIN ( LocationEvents, VehicleStatusEvents ),
                LocationEvents[LocationVehicleID] = VehicleStatusEvents[StatusVehicleID]
                    && LocationEvents[LocationStartDateTime] <= VehicleStatusEvents[StatusEndDateTime]
                    && LocationEvents[LocationEndDateTime] >= VehicleStatusEvents[StatusStartDateTime]
            ),
            "CountOfDays", IF (
                [LocationStartDateTime] <= [StatusStartDateTime]
                    && [LocationEndDateTime] >= [StatusEndDateTime],
                DATEDIFF ( [StatusStartDateTime], [StatusEndDateTime], DAY ),
                IF (
                    [LocationStartDateTime] > [StatusStartDateTime]
                        && [LocationEndDateTime] >= [StatusEndDateTime],
                    DATEDIFF ( [LocationStartDateTime], [StatusEndDateTime], DAY ),
                    IF (
                        [LocationStartDateTime] <= [StatusStartDateTime]
                            && [LocationEndDateTime] <= [StatusEndDateTime],
                        DATEDIFF ( [StatusStartDateTime], [LocationEndDateTime], DAY ),
                        IF (
                            [LocationStartDateTime] >= [StatusStartDateTime]
                                && [LocationEndDateTime] <= [StatusEndDateTime],
                            DATEDIFF ( [LocationStartDateTime], [LocationEndDateTime], DAY ),
                            BLANK ()
                        )
                    )
                )
            )
        ),
        LocationEvents[LocationID] = [LocationID]
            && VehicleStatusEvents[ReasonCodeID] = [ReasonCodeID]
    ),
    [CountOfDays]
)

然后在Power BI中,您可以使用此度量来构建矩阵(或任何其他可视化对象):

Then in Power BI you can build a matrix (or any other visualization) using this measure:

如果您不完全理解度量表达式,则为T-SQL翻译:

If you don't understand completely the measure expression, here is the T-SQL translation:

SELECT
    dt.VehicleID,
    dt.ReasonCodeID,
    dt.LocationID,
    SUM(dt.Diff) [Total Days]
FROM 
(
    SELECT
        CASE
            WHEN a.StartDateTime <= b.StartDateTime AND a.EndDateTime >= b.EndDateTime  -- Inside range
               THEN DATEDIFF(DAY, b.StartDateTime, b.EndDateTime)
            WHEN a.StartDateTime > b.StartDateTime AND a.EndDateTime >= b.EndDateTime  -- |-----|*****|....|
               THEN DATEDIFF(DAY, a.StartDateTime, b.EndDateTime)
            WHEN a.StartDateTime <= b.StartDateTime AND a.EndDateTime <= b.EndDateTime  -- |...|****|-----|
               THEN DATEDIFF(DAY, b.StartDateTime, a.EndDateTime)
            WHEN a.StartDateTime >= b.StartDateTime AND a.EndDateTime <= b.EndDateTime  -- |---|****|-----
               THEN DATEDIFF(DAY, a.StartDateTime, a.EndDateTime)
        END Diff,
        a.VehicleID,
        b.ReasonCodeID,
        a.LocationID --a.StartDateTime, a.EndDateTime, b.StartDateTime, b.EndDateTime
    FROM LocationEvents a
        CROSS JOIN VehicleStatusEvents b
    WHERE a.VehicleID = b.VehicleID
        AND 
        (
            (a.StartDateTime <= b.EndDateTime)
                AND (a.EndDateTime >= b.StartDateTime)
        )
) dt
GROUP BY dt.VehicleID,
         dt.ReasonCodeID,
         dt.LocationID

在T-SQL中请注意,您也可以使用 INNER JOIN 运算符。

Note in T-SQL you could use an INNER JOIN operator too.

让我知道如果有帮助。

这篇关于关联独立事件的顺序-计算时间相交的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆