从MySQL中的时间戳排序表中选择第一个和最后一个匹配列 [英] Select first and last match by column from a timestamp-ordered table in MySQL

查看:102
本文介绍了从MySQL中的时间戳排序表中选择第一个和最后一个匹配列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



说我在MySQL中有一个表,它看起来像像这样:

  ----------------------- -------------------------- 
OWNER_ID | ENTRY_ID | VEHICLE | TIME | LOCATION
--------------------------------------------- ----
1 | 1 | 123456 | 2016-01-01 00:00:00 | A
1 | 2 | 123456 | 2016-01-01 00:01:00 | B
1 | 3 | 123456 | 2016-01-01 00:02:00 | C
1 | 4 | 123456 | 2016-01-01 00:03:00 | C
1 | 5 | 123456 | 2016-01-01 00:04:00 | B
1 | 6 | 123456 | 2016-01-01 00:05:00 | A
1 | 7 | 123456 | 2016-01- 01 00:06:00 | A
...
1 | 999 | 123456 | 2016-01-01 09:10:00 | A
1 | 1000 | 123456 | 2016-01 -01 09:11:00 | A
1 | 1001 | 123456 | 2016-01-01 09:12:00 | B
1 | 1002 | 123456 | 2016-01-01 09:13: 00 | C
1 | 1003 | 123456 | 2016-01-01 09:14:00 | C
1 | 1004 | 123456 | 2016-01-01 09:15:00 | B
...




请注意表格模式只是由所以我可以解释
我试图完成的事情......


想象一下,从ENTRY_ID 6到999, LOCATION列是A。我需要的所有应用程序基本上都是1-6行,然后是1000行。从第7行到第999行的所有内容都是不需要进一步处理的不必要数据。我正在努力做的是要么忽视这些线,而必须将数据处理移动到我的应用程序中,或者更好地删除它们。



我在抓我的头与此,因为:
$ b $ 1)我不能按LOCATION排序,然后只取第一个和最后一个条目,因为时间顺序对我的应用程序很重要,将丢失 - 例如,如果我以这种方式处理这些数据,我会以第1行和第1000行结束,丢失第6行。



2) d更喜欢不把这些数据的处理移动到我的应用程序中,这些数据对我的需求来说是多余的,如果我可以避免它,根本没有意义。



鉴于上述示例数据,一旦我有解决方案,我最终想要的是:

  ----- -------------------------------------------- 
OWNER_ID | ENTRY_ID | VEHICLE | TIME | LOCATION
--------------------------------------------- ----
1 | 1 | 123456 | 2016-01-01 00:00:00 | A
1 | 2 | 123456 | 2016-01-01 00:01:00 | B
1 | 3 | 123456 | 2016-01-01 00:02:00 | C
1 | 4 | 123456 | 2016-01-01 00:03:00 | C
1 | 5 | 123456 | 2016-01-01 00:04:00 | B
1 | 6 | 123456 | 2016-01-01 00:05:00 | A
1 | 1000 | 123456 | 2016-01- 01 09:11:00 | A
1 | 1001 | 123456 | 2016-01-01 09:12:00 | B
1 | 1002 | 123456 | 2016-01-01 09:13:00 | C
1 | 1003 | 123456 | 2016-01-01 09:14:00 | C
1 | 1004 | 123456 | 2016-01-01 09:15:00 | B
...

希望我在这里有意义,不要错过显而易见的东西!


@Aliester - 有没有一种方法可以确定某行不需要从包含在该行中的数据中处理


不幸的是,


@O。琼斯 - 这听起来像你希望确定您的表中最早的和
最新的时间戳为ENTRY_ID,
的每个不同的值,然后从匹配那些
时间戳的表中检索详细信息行。那是对的吗?您的ENTRY_ID值是否唯一?他们的
是否保证按升序排列?如果确实如此,您的查询可以更便宜
。如果您有时间,请编辑您的问题
以阐明这些要点。


我试图找到抵达时间在某个位置,然后是从该位置出发的时间。是的,ENTRY_ID是一个独特的字段,但是您不能认为早先的ENTRY_ID等于先前的时间戳 - 传入的数据是从车辆上的GPS单元发送的,并且不一定按照它们发送的顺序处理以解决网络限制问题。

解决方案

这是SQL解决棘手的问题,因为SQL是关于数据集的,而不是数据序列。这在MySQL中是非常棘手的,因为其他SQL变体具有合成的ROWNUM函数,并且MySQL不会在2016年末发布。



您需要在这里结合两组数据。


  1. 您的数据库的行集合紧接在之前,及时位置发生变化。

  2. 立即在位置发生变化后的一组行。

为了达到这个目的,您需要从一个子查询开始,行,由 VEHICLE 然后 TIME 排序,并带有行号。 ( http://sqlfiddle.com/#!9/6c3bc7/2/0)请注意,Sql Fiddle中的示例数据与示例数据不同。

  SELECT(@rowa:= @ rowa + 1)rownum,
loc。*
FROM loc
JOIN(SELECT @rowa:= 0)init
ORDER BY VEHICLE,TIME

然后,您需要自行加入该子查询,使用ON子句排除同一位置处的连续行,并在行前位置发生变化。比较连续行由 ON ... b.rownum = a.rownum + 1 完成。这是这个查询。 ( http://sqlfiddle.com/#!9/6c3bc7/1/0

  SELECT a。* 
FROM(
SELECT(@rowa:= @rowa + 1)rownum,
loc。*
从loc
JOIN(SELECT @rowa:= 0)init
ORDER BY VEHICLE,TIME $ b $ a
JOIN(
SELECT(@rowb:= @rowb + 1)rownum,
loc。*
FROM loc
JOIN(SELECT @rowb:= 0)init
ORDER BY VEHICLE,TIME
)b ON a.VEHICLE = b.VEHICLE
AND b.rownum = a.rownum + 1
AND a.location< b.location

这个子查询的一个变种,你说 SELECT b。 * ,在位置更改后立即获取行( http ://sqlfiddle.com/#!9 / 6c3bc7 / 3/0

最后,您将 UNION 这两个查询,并对其进行适当排序,然后删除重复连续位置的行。请注意,这在MySQL中非常冗长,因为用于生成行号的令人讨厌的 @rowa:= @rowa + 1 hack必须使用不同的变量( @rowa @rowb 等)在子查询的每个副本中。 ( http://sqlfiddle.com/#!9/6c3bc7/4/0

  SELECT a。* 
FROM(
SELECT(@rowa:= @rowa + 1)rownum,
loc。*
从loc
JOIN(SELECT @rowa:= 0)init
ORDER BY VEHICLE,TIME $ b $ a
JOIN(
SELECT(@rowb:= @rowb + 1)rownum,
loc。*
FROM loc
JOIN(SELECT @rowb:= 0)init
ORDER BY VEHICLE,TIME
)b ON a.VEHICLE = b.VEHICLE and b.rownum = a.rownum + 1 AND a.location<> b.location

UNION

SELECT d。*
FROM(
SELECT(@rowc:= @rowc + 1)rownum,
loc。*
FROM loc
JOIN(SELECT @rowc:= 0)init
ORDER BY VEHICLE,TIME
)c
JOIN(
SELECT (@rowd:= @rowd + 1)rownum,
loc。*
FROM loc
JOIN(SELECT @rowd:= 0)init
ORDER BY VEHICLE,TIME
)d ON c.VEHICLE = d.VEHICLE AND c.rownum = d.rownum - 1 AND c.location<> d.location
由VEHICLE,TIME
命令

而且,在下一代MySQL中,可用现在在MariaDB 10.2的测试版中,这要容易得多。新一代作为公共表格表达式和行编号。

  with loc as 

SELECT ROW_NUMBER( )OVER(按车辆分段划分时间)rownum,
loc。*
从loc


从loc a $中选择a。*
b $ b加入loc b ON a.VEHICLE = b.VEHICLE
AND b.rownum = a.rownum + 1
AND a.location<> b.location
union
从loc a中选择b。*
a
加入loc b ON a.VEHICLE = b.VEHICLE
AND b.rownum = a.rownum + 1
和a.location<> b.location
车辆订单,时间


Stackoverflow,

I need your help!

Say I have a table in MySQL that looks something like this:

-------------------------------------------------
 OWNER_ID | ENTRY_ID | VEHICLE | TIME | LOCATION
-------------------------------------------------
1|1|123456|2016-01-01 00:00:00|A
1|2|123456|2016-01-01 00:01:00|B
1|3|123456|2016-01-01 00:02:00|C
1|4|123456|2016-01-01 00:03:00|C
1|5|123456|2016-01-01 00:04:00|B
1|6|123456|2016-01-01 00:05:00|A
1|7|123456|2016-01-01 00:06:00|A
...
1|999|123456|2016-01-01 09:10:00|A
1|1000|123456|2016-01-01 09:11:00|A
1|1001|123456|2016-01-01 09:12:00|B
1|1002|123456|2016-01-01 09:13:00|C
1|1003|123456|2016-01-01 09:14:00|C
1|1004|123456|2016-01-01 09:15:00|B
...

Please note that the table schema is just made up so I can explain what I'm trying to accomplish...

Imagine that from ENTRY_ID 6 through 999, the LOCATION column is "A". All I need for my application is basically rows 1-6, then row 1000 onwards. Everything from row 7 to 999 is unnecessary data that doesn't need to be processed further. What I am struggling to do is either disregard those lines without having to move the processing of the data into my application, or better yet, delete them.

I'm scratching my head with this because:

1) I can't sort by LOCATION then just take the first and last entries, because the time order is important to my application and this will become lost - for example, if I processed this data in this way, I would end up with row 1 and row 1000, losing row 6.

2) I'd prefer to not move the processing of this data to my application, this data is superfluous to my requirements and there is simply no point keeping it if I can avoid it.

Given the above example data, what I want to end up with once I have a solution would be:

-------------------------------------------------
 OWNER_ID | ENTRY_ID | VEHICLE | TIME | LOCATION
-------------------------------------------------
1|1|123456|2016-01-01 00:00:00|A
1|2|123456|2016-01-01 00:01:00|B
1|3|123456|2016-01-01 00:02:00|C
1|4|123456|2016-01-01 00:03:00|C
1|5|123456|2016-01-01 00:04:00|B
1|6|123456|2016-01-01 00:05:00|A
1|1000|123456|2016-01-01 09:11:00|A
1|1001|123456|2016-01-01 09:12:00|B
1|1002|123456|2016-01-01 09:13:00|C
1|1003|123456|2016-01-01 09:14:00|C
1|1004|123456|2016-01-01 09:15:00|B
...

Hopefully I'm making sense here and not missing something obvious!

@Aliester - Is there a way to determine that a row doesn't need to be processed from the data contained within that row?

Unfortunately not.

@O. Jones - It sounds like you're hoping to determine the earliest and latest timestamp in your table for each distinct value of ENTRY_ID, and then retrieve the detail rows from the table matching those timestamps. Is that correct? Are your ENTRY_ID values unique? Are they guaranteed to be in ascending time order? Your query can be made cheaper if that is true. Please, if you have time, edit your question to clarify these points.

I'm trying to find the arrival time at a location, followed by the departure time from that location. Yes, ENTRY_ID is a unique field, but you cannot take it as a given that an earlier ENTRY_ID will equal an earlier timestamp - the incoming data is sent from a GPS unit on a vehicle and is NOT necessarily processed in the order they are sent due to network limitations.

解决方案

This is a tricky problem to solve in SQL because SQL is about sets of data, not sequences of data. It's extra tricky in MySQL because other SQL variants have a synthetic ROWNUM function and MySQL doesn't as of late 2016.

You need the union of two sets of data here.

  1. the set of rows of your database immediately before, in time, a change in location.
  2. the set of rows immediately after a change in location.

To get that, you need to start with a subquery that generates all your rows, ordered by VEHICLE then TIME, with row numbers. (http://sqlfiddle.com/#!9/6c3bc7/2/0) Please notice that the sample data in Sql Fiddle is different from your sample data.

       SELECT (@rowa := @rowa + 1) rownum,
               loc.*
          FROM loc
          JOIN (SELECT @rowa := 0) init
         ORDER BY VEHICLE, TIME

Then you need to self-join that subquery, use the ON clause to exclude consecutive rows at the same location, and take the rows right before a change in location. Comparing consecutive rows is done by ON ... b.rownum = a.rownum+1. That is this query. (http://sqlfiddle.com/#!9/6c3bc7/1/0)

SELECT a.*
FROM (
            SELECT (@rowa := @rowa + 1) rownum,
                   loc.*
              FROM loc
              JOIN (SELECT @rowa := 0) init
             ORDER BY VEHICLE, TIME
) a 
 JOIN (
             SELECT (@rowb := @rowb + 1) rownum,
                   loc.*
              FROM loc
              JOIN (SELECT @rowb := 0) init
             ORDER BY VEHICLE, TIME
 ) b   ON a.VEHICLE = b.VEHICLE
      AND b.rownum = a.rownum + 1
      AND a.location <> b.location

A variant of this subquery, where you say SELECT b.*, gets the rows right after a change in location (http://sqlfiddle.com/#!9/6c3bc7/3/0)

Finally, you take the setwise UNION of those two queries, order it appropriately, and you have your set of rows with the duplicate consecutive positions removed. Please notice that this gets quite verbose in MySQL because the nasty @rowa := @rowa + 1 hack used to generate row numbers has to use a different variable (@rowa, @rowb, etc) in each copy of the subquery. (http://sqlfiddle.com/#!9/6c3bc7/4/0)

SELECT a.*
  FROM (
        SELECT (@rowa := @rowa + 1) rownum,
               loc.*
          FROM loc
          JOIN (SELECT @rowa := 0) init
         ORDER BY VEHICLE, TIME
) a 
 JOIN (
         SELECT (@rowb := @rowb + 1) rownum,
               loc.*
          FROM loc
          JOIN (SELECT @rowb := 0) init
         ORDER BY VEHICLE, TIME
 ) b ON a.VEHICLE = b.VEHICLE AND b.rownum = a.rownum + 1  AND a.location <> b.location

 UNION

 SELECT d.*
  FROM (
        SELECT (@rowc := @rowc + 1) rownum,
               loc.*
          FROM loc
          JOIN (SELECT @rowc := 0) init
         ORDER BY VEHICLE, TIME
) c 
 JOIN (
         SELECT (@rowd := @rowd + 1) rownum,
               loc.*
          FROM loc
          JOIN (SELECT @rowd := 0) init
         ORDER BY VEHICLE, TIME
 ) d ON c.VEHICLE = d.VEHICLE AND c.rownum = d.rownum - 1  AND c.location <> d.location
 order by VEHICLE, TIME

And, in next-generation MySQL, available in beta now in MariaDB 10.2, this is much much easier. The new generation as common table expressions and row numbering.

 with loc as
     (
            SELECT  ROW_NUMBER() OVER (PARTITION BY VEHICLE ORDER BY time) rownum,
                   loc.*
              FROM loc
)

select a.* 
 from loc a
 join loc b ON a.VEHICLE = b.VEHICLE
           AND b.rownum = a.rownum + 1
           AND a.location <> b.location
 union 
select b.* 
 from loc a
 join loc b ON a.VEHICLE = b.VEHICLE
           AND b.rownum = a.rownum + 1
           AND a.location <> b.location
order by vehicle, time

这篇关于从MySQL中的时间戳排序表中选择第一个和最后一个匹配列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆