根据不规则图案切片 [英] Slice according to an irregular pattern

查看:65
本文介绍了根据不规则图案切片的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面,您可以找到时间序列的摘录.我想分割由空白隔开的行(为了清楚起见,空白是数据中没有空行).有一个重复模式,即X值为21倍,ABCD值为四个不同,Y值为21倍,EFGH四个不同的值,Z值为21的四个等等.在这种情况下,我对获得BCDY,FGHZ等感兴趣.

Below, you can find an excerpt from a time series. I would like to slice the rows set apart by the white space (the white space is for clarity's sake, there is no empty row in the data). There is a recurring pattern, i.e. 21 times value X, four different values ABCD, 21 times value Y, four different values EFGH, 21 times value Z etc. In this case, I'm interested in obtaining BCDY, FGHZ and so on.

问题在于该模式有时会由于丢失数据而中断,使其变得不规则.结果,我要丢弃的值的数量(值X,Y,Z)有时低于21.在2014-01-20 00:05:00到2014-01-20 00:11:00之间的值也可能会丢失.

The problem is that this pattern is sometimes interrupted due to missing data, making it irregular. As a result, the number of values I want to discard (the values X, Y, Z) is sometimes lower than 21. E.g. the values between 2014-01-20 00:05:00 and 2014-01-20 00:11:00 could as well be missing.

我可以考虑遍历整个系列,但是我更喜欢矢量化方法.我想在R中实现它,但是Python或Matlab也可以.

I can think of looping over the series, but I prefer a vectorized approach. I would like to implement it in R, but Python or Matlab will do as well.

有什么想法吗?谢谢.

2014-01-20 00:00:00    197021
2014-01-20 00:01:00    197021
2014-01-20 00:02:00    197021
2014-01-20 00:03:00    197021
2014-01-20 00:04:00    197021
2014-01-20 00:05:00    197021
2014-01-20 00:06:00    197021
2014-01-20 00:07:00    197021
2014-01-20 00:08:00    197021
2014-01-20 00:09:00    197021
2014-01-20 00:10:00    197021
2014-01-20 00:11:00    197021
2014-01-20 00:12:00    197021
2014-01-20 00:13:00    197021
2014-01-20 00:14:00    197021
2014-01-20 00:15:00    197021
2014-01-20 00:16:00    196836

2014-01-20 00:17:00    196865
2014-01-20 00:18:00    196787
2014-01-20 00:19:00    196915
2014-01-20 00:20:00    196902

2014-01-20 00:21:00    196902
2014-01-20 00:22:00    196902
2014-01-20 00:23:00    196902
2014-01-20 00:24:00    196902
2014-01-20 00:25:00    196902
2014-01-20 00:26:00    196902
2014-01-20 00:27:00    196902
2014-01-20 00:28:00    196902
2014-01-20 00:29:00    196902

推荐答案

如果我对您的理解不错,那么您想删除上一行与上一行相同的所有数据行.在Matlab中,您可以使用diff()函数和逻辑索引进行此操作.假设您的数据位于两列矩阵中,则表达式

If I understand you aright, you want to remove all data rows where the last column is unchanged from the previous row. In Matlab, you can do this using the diff() function and logical indexing. Assuming your data is in a two-column matrix, then the expression

data([true; diff(data(:,2))~=0],:)

将返回一个两列矩阵,仅包含满足要求的数据.您可能需要单独检查第一行:从您的描述中我不太清楚您是否总是想要第一行.以上将始终保持下去.将true更改为false以始终删除它.

will return a two-column matrix with just the data meeting the requirement. You might need to check the first row separately: I'm not quite clear from your description whether you always want the first row or not. The above will always keep it. Change the true to false to always drop it.

编辑(响应第一条评论)

在上面的表达式中用false替换true会丢弃第一行.这使您剩下5行的块,您想在每个块中丢弃第一行.这也可以通过逻辑索引来完成.这很简单,只是您需要防止最后一个块包含少于5行的情况:

Replacing true with false in the above expression discards the first row. This leaves you with blocks of 5 rows, of which you want to discard the first row in each block. This can also be done with logical indexing. It's fairly simple, except that you need to protect against the case when the last block contains less than 5 rows:

pData = data([false; diff(data(:,2))~=0],:);
selector = repmat([false; true; true; true; true], ceil(size(pData, 1))/5, 1);
pData = pData(selector(1:size(pData,1)),:);

我希望这会有所帮助!

这篇关于根据不规则图案切片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆