查询记录并按时间段分组 [英] Query Records and Group by a block of time

查看:70
本文介绍了查询记录并按时间段分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个可能每天运行几次的应用程序.每次运行都会生成写入表的数据,以报告发生的事件.主报告表如下所示:

I have an application that may be run several times a day. Each run results in data that is written to a table to report on events that occurred. The main report table looks something like this:

Id    SourceId    SourceType    DateCreated
5048  433         FILE          5/17/2011 9:14:12 AM
5049  346         FILE          5/17/2011 9:14:22 AM
5050  444         FILE          5/17/2011 9:14:51 AM
5051  279         FILE          5/17/2011 9:15:02 AM
5052  433         FILE          5/17/2011 12:34:12 AM
5053  346         FILE          5/17/2011 12:34:22 AM
5054  444         FILE          5/17/2011 12:34:51 AM
5055  279         FILE          5/17/2011 12:35:02 AM

我可以说有两次运行,但是我想要一种能够查询日期范围(运行过程的次数)的方法.我想查询一个查询,该查询导致进程开始的时间和组中的文件数.这种查询可以让我了解我想要的东西,因为我可以看到什么日期和时间以及运行了多少文件,但不完全是我想要的.例如,它不能适应从8:58到9:04的运行.例如,它还会对从9:02和9:15开始的运行进行分组.

I can tell that there were two runs, but I would like a way to be able to query for a date range, the number of times the process was run. I would like to have a query that results in the time the process started and the number of files in the group. This query sort of gets me what I want in terms of I can see what day and hour and how many files were run, but not exactly how I would like. And it would not accomodate runs that ran from 8:58 to 9:04 for example. It also would group runs that started at 9:02 and 9:15 for example.

Select dateadd(day,0,datediff(day,0,DateCreated)) as [Date], datepart(hour, DateCreated) as [Hour], Count(*) [File Count]
From   MyReportTable
Where DateCreated between '5/4/2011' and '5/18/2011'
    and SourceType = 'File'
Group By dateadd(day,0,datediff(day,0,DateCreated)), datepart(hour, DateCreated)
Order By dateadd(day,0,datediff(day,0,DateCreated)), datepart(hour, DateCreated)

我知道所有接近的运行都可能会归为一组,对此我很好.我只希望得到一个大致的分组.

I understand that any runs that are close together will likely get grouped together, and I'm fine with that. I only expect to get a rough grouping.

谢谢!

推荐答案

如果您确定这些运行是连续的并且没有重叠,则应该可以使用ID字段来拆分组.查找仅相距1的ID字段以及大于相差某个阈值的日期创建的字段.从您的数据来看,一次运行中的记录看起来最多只能在1分钟内输入一次,因此安全阈值可以是1分钟或更长.

If you're certain these runs are contiguous and don't overlap, you should be able to use the Id field to break up your groups. Look for Id fields that are only 1 apart AND datecreated fields that are greater than some threshold apart. From your data, it looks like records within a run are entered within at most a minute of each other, so a safe threshold could be a minute or more.

这将为您提供开始时间

SELECT mrtB.Id, mrtB.DateCreated
FROM MyReportTable AS mrtA
INNER JOIN MyReportTable AS mrtB
    ON (mrtA.Id + 1) = mrtB.Id
WHERE DateDiff(mi, mrtA.DateCreated, mrtB.DateCreated) >= 1

我称它为DataRunStarts

I'll call that DataRunStarts

现在,您可以使用它来获取有关组的开始和结束位置的信息

Now you can use that to get info about where the groups started and ended

SELECT drsA.Id AS StartID, drsA.DateCreated, Min(drsB.Id) AS ExcludedEndId
FROM DataRunStarts AS drsA, DataRunStarts AS drsB
WHERE (((drsB.Id)>[drsA].[id]))
GROUP BY drsA.Id, drsA.DateCreated

我将其称为DataRunGroups.我将最后一个字段称为已排除",因为它持有的ID仅用于定义将被拉出的ID集的结束边界.

I'll call that DataRunGroups. I called that last field "Excluded" because the id it holds is just going to be used to define the end boundary for the set of ids that will be pulled.

现在我们可以使用DataRunGroups和MyReportTable来获取计数

Now we can use DataRunGroups and MyReportTable to get the counts

SELECT DataRunGroups.StartID, Count(MyReportTable.Id) AS CountOfRecords
FROM DataRunGroups, MyReportTable
WHERE (((MyReportTable.Id)>=[StartId] And (MyReportTable.Id)<[ExcludedEndId]))
GROUP BY DataRunGroups.StartID;

我称它为DataRunCounts

I'll call that DataRunCounts

现在,我们可以将DataRunGroups和DataRunCounts放在一起以获取开始时间和计数.

Now we can put DataRunGroups and DataRunCounts together to get start times and counts.

SELECT DataRunGroups.DateCreated, DataRunCounts.CountOfRecords
FROM DataRunGroups
INNER JOIN DataRunCounts
    ON DataRunGroups.StartID = DataRunCounts.StartID;

根据您的设置,您可能需要对一个查询执行所有这些操作,但是您了解了.另外,第一次和最后一次运行将不包括在其中,因为第一次运行没有起始ID,最后一次运行也没有结束ID.要包括这些内容,您将只对这两个范围进行查询,并将它们与旧的DataRunGroups查询一起合并以创建新的DataRunGroups.其他使用DataRunGroups的查询将如上所述运行.

Depending on your setup, you may need to do all of this on one query, but you get the idea. Also, the very first and very last runs wouldn't be included in this, because there'd be no start id to go by for the very first run, and no end id to go by for the very last run. To include those, you would make queries for just those two ranges, and union them together along with the old DataRunGroups query to create a new DataRunGroups. The other queries that use DataRunGroups would work just as described above.

这篇关于查询记录并按时间段分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆