如何积累数据集? [英] How to accumulate data-sets?

查看:180
本文介绍了如何积累数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个值 1 N> 1 。一些值COULD连续出现多次。现在我想有第二行计算连续条目,并删除所有这些连续出现的条目,例如:

I have vector with values between 1 and N > 1. Some values COULD occur multiple times consecutively. Now I want to have a second row which counts the consecutively entries and remove all those consecutively occuring entries, e.g.:

A = [1 2 1 1 3 2 4 4 1 1 1 2]'

会导致:

B = [1 1;
     2 1;
     1 2;
     3 1;
     2 1;
     4 2;
     1 3;
     2 1]

(你看,第二列包含连续数!
我最近遇到了 accumarray(),但我找不到任何解决方案与这个任务,因为它总是关注整个向量,而不仅是连续

(you see, the second column contains the number of consecutively entries! I came across accumarray() in MATLAB recently but I can't find any solution with it for this task since it always regards the whole vector and not only consecutively entries.

任何想法?

推荐答案

最可读或优雅的做法,但如果你有大的向量和速度是一个问题,这种矢量化可能有助于...

This probably isn't the most readable or elegant way of doing it, but if you have large vectors and speed is an issue, this vectorisation may help...

A = [1 2 1 1 3 2 4 4 1 1 1 2];

我将用前导和尾随零填充A以捕获第一个和最后的转换。

First I'm going to pad A with a leading and trailing zero to capture the first and final transitions

>>  A = [0, A, 0];

可以找到过渡位置,不等于零:

The transition locations can be found where the difference between neighbouring values is not equal to zero:

>> locations = find(diff(A)~=0);

但是因为我们用A填充A的开头,所以第一个转换是无意义的, 2:结束。这些值中的A是每个段的值:

But because we padded the start of A with a zero, the first transition is nonsensical, so we only take the locations from 2:end. The values in A of these are the value of each segment:

>> first_column = A(locations(2:end))

ans =

     1     2     1     3     2     4     1     2

这是第一个colomn - 现在找到每个数字的计数。这可以从位置的差异找到。这是两端的填充A变得重要的地方:

That's the first colomn - now to find the count of each number. This can be found from the difference in locations. This is where padding A at both ends becomes important:

>> second_column = diff(locations)

ans =

 1     1     2     1     1     2     3     1

最后结合:

B = [first_column', second_column']

B =

 1     1
 2     1
 1     2
 3     1
 2     1
 4     2
 1     3
 2     1

这些都可以合并成一条较不易读的行:

This can all be combined into one less-readable line:

>> A = [1 2 1 1 3 2 4 4 1 1 1 2]';
>> B = [A(find(diff([A; 0]) ~= 0)), diff(find(diff([0; A; 0])))]

B =

 1     1
 2     1
 1     2
 3     1
 2     1
 4     2
 1     3
 2     1

这篇关于如何积累数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆