在Matlab中检查时间戳记间隔 [英] Checking timestamp intervals in matlab

查看:107
本文介绍了在Matlab中检查时间戳记间隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是想知道是否有一种方法可以比较许多时间戳,以查看是否缺少任何时间戳. 目前,我每年要看365天,每天要读取48个读数. (在Excel文档中)因此我需要分析17000多点. 目前,时间戳记的格式为:

I was just wondering if there is a way to compare many time stamps to see if any are missing. At the moment I am looking at 365 days of the year, with everyday 48 readings being taken. (In an excel document) therefore I have over 17000 points to analyse. At the moment the format of the timestamps are:

1/01/2011 12:30 AM
1/01/2011 1:00 AM
1/01/2011 1:30 AM
1/01/2011 2:00 AM
1/01/2011 2:30 AM

我需要检查每30分钟是否缺少任何值.我已经考虑过使用

I need to go through and see if any values are missing every 30min. I have thought of using

datenum('')

,然后尝试进行比较,如果它不遵循趋势,则抛出错误并返回先前的值.但是我不确定.

and then trying to compare it, and throw an error when it does not follow the trend and return the previous value. But I am not sure.

任何帮助将不胜感激!

Any help would be appreciated!

推荐答案

您可以使用datenum并放入您提供的示例中确切的日期格式的字符串之一.如果您有半小时的时间间隔,则连续的datenum调用之间的差异应产生相同的差异.例如,让我们将日期放入像这样的单元格数组中:

You can use datenum and put in one of those exact date-formatted strings that are in the example that you have provided. If you have time intervals of half an hour, then the difference between successive datenum calls should yield the same difference. For example, let's place your dates into a cell array like so:

C = {'1/01/2011 12:30 AM',
'1/01/2011 1:00 AM',
'1/01/2011 1:30 AM',
'1/01/2011 2:00 AM',
'1/01/2011 2:30 AM'};

我们可以使用 diff 来区分连续元素之间的区别.给定数组中的第i个 元素,diff的工作原理,给定输入值x_i的情况下y_i处向量的输出为:

We can take the difference between successive elements using diff. How diff works, given the ith element in an array, the output for the vector at y_i given the input value x_i is:

y_i = x_{i+1} - x_i

因此,这将返回一个向量,该向量的长度比原始向量小一.我们基本上正在考虑从日期的第二个元素开始的元素.这样,对该单元格数组中的每个元素应用diffdatenum,我们得到:

Therefore, this will return a vector that has its length as one less than the original. We are basically considering elements from the second element of your dates onwards. As such, applying diff to datenum for every element in this cell array, we get:

format long
diffs = diff(datenum(C))

diffs =

   0.020833333255723
   0.020833333372138
   0.020833333372138
   0.020833333255723

前7位有效数字左右很重要.其余数字是由于一些精度差异而引起的,但现在暂时将其搁置.因此,您需要检查差异中的每个元素是否大约为0.0208333.如果不是,那么您就错过了一个间隔.让我们尝试几次伪造:

The first 7 significant digits or so matters. The rest of the digits are due to some precision differences, but let's shelve that for now. As such, you need to check to see if each element in the difference is about 0.0208333. If it isn't, then you're missing an interval. Let's try fudging a few of the times:

C = {'1/01/2011 12:30 AM',
'1/01/2011 1:30 AM',
'1/01/2011 2:30 AM',
'1/01/2011 3:00 AM',
'1/01/2011 4:30 AM'};

format long
diffs = diff(datenum(C))

diffs =

   0.041666666627862
   0.041666666627862
   0.020833333372138
   0.062500000000000

因此,对于C的第二,第三和最后一个元素,我们在半小时的间隔内缺少测量值.具体来说,我假设您的单位是半小时.这样,两次丢失测量之间的最小跳变可能是一个小时,这就是0.02080.0416之间的跳变,因此差值0.02.因此,我们需要在此数组中找到比0.0416大的位置.为了安全起见,请将其设置为0.03.因此,如果您要以编程方式执行此操作,则可以执行以下操作:

Therefore, for the second, third and last element of C, we are missing measurements at the half an hour interval. Specifically, I'm assuming that your units are in times of half an hour. As such, the smallest possible jump between missing measurements is an hour, and that's a jump between 0.0208 and 0.0416, so that's about a difference of 0.02. As such, we need to find locations in this array where it's bigger than 0.0416. To be safe, let's set this to 0.03. As such, if you want to do this programatically, you could do this:

diffs = diff(datenum(C));
locs = find(diffs > 0.03) + 1;

find 找出矩阵/数组中的位置满足特定的布尔条件.在这种情况下,我们要查找差异为> 0.03的位置.我们也用1抵消了,因为我们正在像前面讨论的那样看第二个元素.通过对修改后的C数组执行此操作,我们得到:

find figures out the locations in a matrix / array which satisfies a particular Boolean condition. In this case, we want to find locations whose differences are > 0.03. We also offset by 1 because we are looking at the second element like we talked about before. By doing this with our modified C array, we get:

locs =

     2
     3
     5

这告诉我们,在修改后的日期数组(C)的位置2、3和5,我们在半小时标记处缺少测量值.

This tells us that at locations 2, 3 and 5 for our modified dates array (C), we are missing measurements at the half hour mark.

要仔细检查我们的第一个示例,如果我们在没有跳过的情况下将其应用于第一个示例,则会得到预期的空数组:

To double check for our first example, if we applied this on the first example when there are no skips, we get the empty array as expected:

locs = 

[]


作为一个小小的奖励,我们可以显示缺少间隔的位置.具体来说:


As a little bonus, we can display at which locations there is an interval missing. Specifically:

missingTimes = C(locs)

对于我们的时间错乱的示例,我们得到:

For our fudged time example, we get:

missingTimes = 

    '1/01/2011 1:30 AM'
    '1/01/2011 2:30 AM'
    '1/01/2011 4:30 AM'


编辑

从我们在评论后面的对话中,一旦您有一个没有时间的日期而只有日期的日期,这就会变得混乱.具体来说,当您在单元格数组中调用其中至少一个的datenum时,我们将不再获得浮点精度.我们只会得到整数(出于某些奇怪的原因……我不知道为什么.我可能应该对此发表一个StackOverflow帖子).换句话说,如果我们这样做:


Edit

From our conversation in the comments back, this messes up as soon as you have a date with no time and just the date. Specifically, when you call datenum with at least one of these in the cell array, we won't get floating point precision anymore. We will only get whole numbers (for some odd reason... and I can't figure out why. I should probably make a StackOverflow post about this). In other words, if we did:

C = {'1/01/2011',
'1/01/2011 12:30 AM',
'1/01/2011 1:30 AM',
'1/01/2011 2:30 AM',
'1/01/2011 3:00 AM',
'1/01/2011 4:30 AM'};

如果我们这样做:

diff(datenum(C))

我们得到:

ans =

  0
  0
  0
  0
  0


要解决这个问题,我必须实现自己的diff版本,并分别访问dates数组中的元素.因此,请改为:


To get around this, I had to implement my own version of diff, and access the elements in the dates array individually. Therefore, do this instead:

format long;
diffs = arrayfun(@(x) datenum(C{x}) - datenum(C{x-1}), (2:numel(C)).');

我使用了 arrayfun ,并且指定了一个输入数组从2到C中的所有元素.对于输出中的每个元素,我们采用第i+1个元素的datenum表示形式,并将其从第i个元素中减去.这本质上是手动实现diff操作,并且当您包含没有时间的日期时,可以逃避轻微的错误.老实说,我不知道为什么要删除所有数字后的所有小数点.....但是现在可以使用.

I used arrayfun and I specify an input array that goes from 2 up to as many elements as we have in C. For each element in our output, we take the datenum representation of the i+1th element and subtract this from our ith element. This essentially implements the diff operation manually, and escapes the slight bug when you include a date with no time on it. I honestly have no idea why all of the decimal points after the whole numbers get removed.... but this works for now.

无论如何,我们得到:

diffs =

   0.020833333372138
   0.041666666627862
   0.041666666627862
   0.020833333372138
   0.062500000000000


编辑#2

看来您仍然遇到麻烦.我要提出的另一个建议是找到那些缺少时间戳的时间.然后,我们将找到这些条目并手动放置12:00 AM时间戳.因此,我们可以使用正则表达式通过 regexp .正则表达式尝试查找字符串中模式出现的位置.这样,我们要做的是找到那些最后包含时间戳的模式,然后使用一些其他代码将时间戳插入其中.让我们来看一个玩具示例:


Edit #2

It looks like you're still getting trouble. Another suggestion that I would make is to find those times that are missing the 12:00 AM time stamp. We would then find these entries and place the 12:00 AM time stamp manually. As such, we can use regular expressions to do that through regexp. Regular expressions try and find where patterns occur in strings. As such, what we're going to do find those patterns that don't contain the time stamp at the end, then use some additional code to insert this time stamp in. Let's consider a toy example:

C = {'1/01/2011',
'1/01/2011 12:30 AM',
'1/01/2011 1:30 AM',
'1/01/2011 2:30 AM',
'1/01/2011 3:00 AM',
'1/01/2011 4:30 AM',
'1/02/2011',
'1/02/2011 12:30 AM',
'1/02/2011 1:30 AM',
'1/02/2011 2:30 AM',
'1/02/2011 3:00 AM',
'1/02/2011 4:30 AM',
'1/03/2011',
'1/03/2011 12:30 AM',
'1/03/2011 1:30 AM',
'1/03/2011 2:30 AM',
'1/03/2011 3:00 AM',
'1/03/2011 4:30 AM'};

在这里,我们有各种日期和时间,其中有些缺少12:00 AM时间戳.因此,这是我要在以下位置插入时间戳的方法:

Here we have various dates and times, with some missing the 12:00 AM time stamp. As such, here's how I am going to insert the time stamps in:

missingTimeStampsLocs = cellfun(@(x) isempty(regexp(x,'[0-9]{1,2}\/[0-9]{2}\/[0-9]{4} [0-9]{1,2}:[0-9]{2} [AaPp][Mm]')), C);
missingTimeStamps = C(missingTimeStampsLocs);
filledInTimeStamps = cellfun(@(x) [x ' 12:00 AM'], missingTimeStamps, 'uni', 0);
C(missingTimeStampsLocs) = filledInTimeStamps;

这看起来有些令人生畏的代码,但可以肯定地解释.让我们从第一行代码开始.首先,我们调用regexp,其中包含要查看的字符串,然后第二个参数用于描述您要查找的 pattern .我要做的是按照以下格式查找所有日期:

This looks like some intimidating piece of code, but can certainly be explained. Let's start with the first line of code. First, we call regexp where it takes in a string we want to look at and then the second parameter is for describing the pattern you are looking for. What I have to do here is I am going to look for all dates that following the following format:

 #/##/#### ##:## xx
       OR
##/##/#### ##:## xx

#表示数字,x表示字符.我们将搜索遵循此确切格式的所有日期.任何不遵循此格式的日期都将标记出来,这意味着它们缺少时间戳.看一下下面的语句:

# denote a number and x denotes a character. We are going to search for all dates that follow this exact format. Any dates that don't follow this format we are going to flag and that means they are missing timestamps. Take a look at this statement:

regexp(x,'[0-9]{1,2}\/[0-9]{2}\/[0-9]{4} [0-9]{1,2}:[0-9]{2} [AaPp][Mm]')

这是说,对于字符串x,我们将查找以1或2个数字开头,后跟/,紧接着是2个数字,再后跟,后面紧跟4个数字,再跟一个空格,然后我们将查找1个或2个数字,接着是:,然后是正好2个数字,再跟一个空格,然后是AMPM并且不区分大小写.这意味着AMPM可以是大写或小写.

What this is saying is that for a string x, we are going to look for a string that starts with 1 or 2 numbers, followed by a /, followed by exactly 2 numbers, followed by a /, followed by exactly 4 numbers, followed by a space, then we will look for either 1 or 2 numbers, followed by a :, then exactly 2 numbers, followed by a space, then either AM or PM and is case-insensitive. This means that the AM or PM can be either upper or lower case.

regexp返回的内容是您的字符串中找到该字符串的位置.在我们的情况下,它将返回1表示我们在字符串的开始处找到了该字符串,或者返回 empty ,这意味着我们没有找到这样的字符串.如果regexp返回空,则该日期缺少时间戳.这就是为什么我用 isempty 包裹此呼叫以进行查看的原因如果regexp返回空.然后,我使用 cellfun 封装此调用,以便我们可以迭代日期单元格数组中的所有元素.输出(存储在missingTimeStampsLocs中)将包含一个布尔数组,其中1表示缺少时间戳,而0表示没有时间戳.

What will be returned from regexp are the locations in your string where this string is found. In our case, it will either return 1 meaning that we have found this string at the starting of it, or empty which means that we have not found such a string. If regexp returns empty, then this date has a missing timestamp. This is why I wrapped this call with isempty to check to see if regexp returns empty. I then wrap this call using cellfun so that we can iterate over all elements in our date cell array. The output (stored in missingTimeStampsLocs) will contain a Boolean array where 1 denotes that the timestamp is missing, and 0 denotes that it isn't missing.

然后,下一行代码从原始单元格数组中提取那些缺少日期的日期.然后,我再运行一次cellfun遍历这些单元格,然后在提取的单元格数组中每个字符串的末尾连接12:00 AM时间戳.请注意,我还指定了两个附加参数('uni'0),因为输出不再是单个值,而是一个字符串.这些字符串将放置在单元格数组中,这是完美的,因为无论如何它们都是从单元格数组中提取的.我们不必在第一个cellfun调用中指定它,因为输出是单个值-在这种情况下,它是01的布尔值.完成后,我们将那些缺少时间戳的日期替换为我们刚刚用12:00 AM时间戳填充的日期.这将覆盖到C中.这样,通过使用我们的C运行上述代码,我们将获得:

The next line of code then extracts from the original cell array those dates that have missing dates. I then run cellfun one more time to iterate through these cells, and we then concatenate the 12:00 AM timestamp at the end of each string in this extracted cell array. Note that I also specify two additional parameters ('uni' and 0) because the output is no longer a single value, but a string instead. These strings will be placed inside a cell array, which is perfect because they were extract from a cell array anyway. We didn't have to specify this in the first cellfun call as the output is a single value - in that case, it was a Boolean value of 0 or 1. Once we are done, we then replace those dates that have the missing timestamps with those that we just filled in with the 12:00 AM time stamp. This gets overwritten into C. As such, by running the above code with our C, this is what we get:

C =  

'1/01/2011 12:00 AM'
'1/01/2011 12:30 AM'
'1/01/2011 1:30 AM'
'1/01/2011 2:30 AM'
'1/01/2011 3:00 AM'
'1/01/2011 4:30 AM'
'1/02/2011 12:00 AM'
'1/02/2011 12:30 AM'
'1/02/2011 1:30 AM'
'1/02/2011 2:30 AM'
'1/02/2011 3:00 AM'
'1/02/2011 4:30 AM'
'1/03/2011 12:00 AM'
'1/03/2011 12:30 AM'
'1/03/2011 1:30 AM'
'1/03/2011 2:30 AM'
'1/03/2011 3:00 AM'
'1/03/2011 4:30 AM'

然后我们可以通过我们的检测代码运行此命令,以查看哪些日期在跳一个半小时.

We can then run this through our detection code and see which dates are jumping by a half an hour.

diffs = diff(datenum(C));
locs = find(diffs > 0.03) + 1;
missingTimes = C(locs)

我们因此得到:

missingTimes = 

'1/01/2011 1:30 AM'
'1/01/2011 2:30 AM'
'1/01/2011 4:30 AM'
'1/02/2011 12:00 AM'
'1/02/2011 1:30 AM'
'1/02/2011 2:30 AM'
'1/02/2011 4:30 AM'
'1/03/2011 12:00 AM'
'1/03/2011 1:30 AM'
'1/03/2011 2:30 AM'
'1/03/2011 4:30 AM'


我真的希望这是我最后一次解决这个问题(LOL),因为我很确定我已经涵盖了所有意外情况.我还假设您的日期是以特定的格式设置的,我希望这可以解决您的问题.我们也不需要使用我们编写的自定义diff函数,因为我现在正在完成您的日期以在其上贴上12:00 AM时间戳.


I really hope this is the last time I work on this problem (LOL), as I'm quite sure I've covered all contingencies. I'm also assuming that your dates are formatted in a specific way, and I'm hoping this will solve your problem. We also don't need to use our custom diff function that we wrote, as I am now completing your dates to have the 12:00 AM timestamp on it.

祝你好运!

这篇关于在Matlab中检查时间戳记间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆