具有相同标签的列的均值 [英] Mean of columns with same label

查看:101
本文介绍了具有相同标签的列的均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个向量

data vector: A = [1 2 2 1 2 6; 2 3 2 3 3 5]
label vector: B = [1 2 1 2 3 NaN]

我想取具有相同标签的所有列的平均值,然后将它们输出为按标签号排序的矩阵,而忽略NaN.因此,在此示例中,我想要:

I want to take the mean of all columns that have the same label and output these as a matrix sorted by label number, ignoring NaNs. So, in this example I would want:

labelmean(A,B) = [1.5 1.5 2; 2 3 3]

这可以通过这样的for循环来完成.

This can be done with a for-loop like this.

function out = labelmean(data,label)
out=[];
for i=unique(label)
    if isnan(i); continue; end
    out = [out, mean(data(:,label==i),2)];
end 

但是,我正在处理包含许多数据点和标签的巨大数组.此外,该代码段将经常执行.我想知道是否有一种更有效的方法来执行此操作而不循环遍历每个单独的标签.

However, I'm dealing with huge arrays containing many datapoints and labels. Additionally, this code snippet will be executed often. I'm wondering if there is a more efficient way to do this without looping over every individual label.

推荐答案

这里是一种方法:

  1. 获取不包含NaN的标签的索引.
  2. 创建一个零的稀疏矩阵,然后将其乘以A将得到所需的行总和.
  3. 将该矩阵除以每列的总和,以使总和成为平均值.
  4. 应用矩阵乘法以获取结果,并将其转换为完整矩阵.
  1. Get the indices of labels not containing NaNs.
  2. Create a sparse matrix of zeros and ones that multiplied by A would give the desired row sums.
  3. Divide that matrix by the sum of each column, so that the sums become averages.
  4. Apply matrix multiplication to get the result, and convert to a full matrix.

代码:

I = find(~isnan(B));                                 % step 1
t = sparse(I, B(I), 1, size(A,2), max(B(I)));        % step 2
t = bsxfun(@rdivide, t, sum(t,1));                   % step 3
result = full(A*t);                                  % step 4

这篇关于具有相同标签的列的均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆