如何在Sql Server的相邻记录中找到具有相同值的记录? (我相信正确的用语是地区?) [英] How do I find records that have the same value in adjacent records in Sql Server? (I believe the correct term for this is a region??)

查看:88
本文介绍了如何在Sql Server的相邻记录中找到具有相同值的记录? (我相信正确的用语是地区?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

查找具有相同值的相邻记录的开始和结束时间?

Finding the start and end time for adjacent records that have the same value?

我有一个表,其中包含心律读数(每分钟心跳数)和日期时间字段. (实际上是heartrate_idheartratedatetime.)数据是由设备生成的,该设备每6秒记录一次心率和时间.有时,心率监测器会给出错误的读数,并且记录下来的每分钟心跳会在一段时间内粘住".用木棍表示,每分钟的节拍值在相邻时间将是相同的.

I have a table that contains heart rate readings (in beats per minute) and datetime field. (Actually the fields are heartrate_id, heartrate, and datetime.) The data are generated by a device that records the heart rate and time every 6 seconds. Sometimes the heart rate monitor will give false readings and the recorded beats per minute will "stick" for an period of time. By sticks, I mean the beats per minute value will be identical in adjacent times.

基本上,我需要找到所有心率相同的记录(例如,每分钟5次心跳,每分钟100次心跳等),但仅在相邻记录中找到.如果设备连续3次(或100次连续读数)每分钟记录25个节拍,则需要定位这些事件.结果需要有心跳,心跳开始的时间和心跳结束的时间,理想情况下,结果看起来应该更像这样:

Basically I need to find all the records where the heart rate is the same (e.g. 5 beats per minute, 100 beats per minute, etc.) in but only on adjacent records. If the device records 25 beats per minute for 3 consecutive reading (or 100 consecutive readings) I need to locate these events. The results need to have the heartrate, time the heartrate started, and the time the heart rate ended and ideally the results would look more of less like this:

heartrate starttime endtime
--------- --------- --------
1.00      21:12:00  21:12:24
35.00     07:00:12  07:00:36

我已经尝试了几种不同的方法,但到目前为止,我还是很出色.任何帮助将不胜感激!

I've tried several different approaches but so far I'm striking out. Any help would be greatly appreciated!

推荐答案

经审查,我在该答案上的原始作品都没有很好.这实际上属于称为差距与岛屿,并且此修订后的答案将使用自我首次回答此问题以来从类似问题/中学到的信息.

Upon review, none of my original work on this answer was very good. This actually belongs to the class of problems known as gaps-and-islands, and this revised answer will use information I've gleaned from similar questions/learned since first answering this question.

事实证明,此查询可以比我最初想象的更简单地完成很多:

It turns out this query can be done a lot more simply than I originally thought:

WITH Grouped_Run AS (SELECT heartRate, dateTime,
                            ROW_NUMBER() OVER(ORDER BY dateTime) -  
                            ROW_NUMBER() OVER(PARTITION BY heartRate ORDER BY dateTime) AS groupingId
                     FROM HeartRate)

SELECT heartRate, MIN(dateTime), MAX(dateTime)
FROM Grouped_Run
GROUP BY heartRate, groupingId
HAVING COUNT(*) > 2

SQL小提琴演示

那么这是怎么回事?差距与岛屿问题的定义之一是需要连续值(或缺少值)的组".通常利用经常被忽略/过于直观的事实来生成序列来解决此问题:减去序列会得到一个恒定值.

So what's happening here? One of the definitions of gaps-and-islands problems is the need for "groups" of consecutive values (or lack thereof). Often sequences are generated to solve this, exploiting an often overlooked/too-intuitive fact: subtracting sequences yields a constant value.

例如,假设以下序列和减法(行中的值不重要):

For example, imagine the following sequences, and the subtraction (the values in the rows are unimportant):

position   positionInGroup  subtraction
=========================================
1          1                0
2          2                0
3          3                0
4          1                3
5          2                3
6          1                5
7          4                3
8          5                3

position是在所有记录上生成的简单序列.
positionInGroup是为每个不同记录集生成的简单序列.在这种情况下,实际上有3组不同的记录(从position = 1, 4, 6开始).
subtraction是其他两列之间的差异的结果.请注意,不同组的值可能会重复!
序列必须共享的关键属性之一是必须在相同顺序的数据行上生成 ,否则将中断.

position is a simple sequence generated over all records.
positionInGroup is a simple sequence generated for each set of different records. In this case, there's actually 3 different sets of records (starting at position = 1, 4, 6).
subtraction is the result of the difference between the other two columns. Note that values may repeat for different groups!
One of the key properties the sequences must share is they must be generated over the rows of data in the same order, or this breaks.

那么SQL如何做到这一点?通过使用ROW_NUMBER(),此函数将在记录的窗口"上生成一系列数字:

So how is SQL doing this? Through the use of ROW_NUMBER() this function will generate a sequence of numbers over a "window" of records:

ROW_NUMBER() OVER(ORDER BY dateTime)

将生成position序列.

ROW_NUMBER() OVER(PARTITION BY heartRate ORDER BY dateTime)

将生成positionInGroup序列,每个heartRate是一个不同的组.
在大多数这种类型的查询的情况下,两个序列的值并不重要,重要的是减法(获取序列组),因此我们只需要减法的结果即可.
我们还需要heartRate及其发生的时间来提供答案.

will generate the positionInGroup sequence, with each heartRate being a different group.
In the case of most queries of this type, the values of the two sequences is unimportant, it's the subtraction (to get the sequence group) that matters, so we just need the result of the subtraction.
We'll also need the heartRate and the times in which they occurred to provide the answer.

原始答案要求卡住心跳的每个运行"的开始和结束时间.这是标准的MIN(...)/MAX(...),表示GROUP BY.我们需要同时使用原始的heartRate列(因为这是非聚合列)我们生成的groupingId(用于标识每个卡住的当前运行"值).

The original answer asked for the start and end times of each of the "runs" of stuck heartbeats. That's a standard MIN(...)/MAX(...), which means a GROUP BY. We need to use both the original heartRate column (because that's a non-aggregate column) and our generated groupingId (which identifies the current "run" per stuck value).

询问的部分问题仅重复了3次或更多次. HAVING COUNT(*) > 2是忽略长度小于等于2的游程的指令;它按组计算行数.

Part of the question asked for only runs that repeated three or more times. The HAVING COUNT(*) > 2 is an instruction to ignore runs of length 2 or less; it counts rows per-group.

这篇关于如何在Sql Server的相邻记录中找到具有相同值的记录? (我相信正确的用语是地区?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆