根据Matlab中单元格的现有列创建新变量 [英] Create new variable based on existing columns of a cell in Matlab

查看:131
本文介绍了根据Matlab中单元格的现有列创建新变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有60万行和5列的单元阵列.在下面的示例中,我仅介绍3种不同的代码,期限为5年.输入:

I have a cell-array with 600 000 rows and 5 columns. In the following example I only present 3 different codes and a period of 5 years. Input:

c1   c2        c3        c4  c5 

1   2006    20060425    559 'IA'
1   2007    20070129    559 'LO'
1   2007    20070826    559 'VC'
1   2008    20080825     34 'VP'
1   2009    20090116     34 'ZO'
4   2007    20070725     42 'OI'
4   2008    20080712     42 'TF'
4   2008    20080428     42 'XU'
11  2007    20070730    118 'AM'
11  2008    20080912    118 'HK'
11  2009    20090318      2 'VT'
11  2010    20100121      2 'ZZ'

我想获得一个新变量,该变量为每个代码(c1)提供c1出现在样本中的年份以及相应的c4值.例如:

I would like to obtain a new variable that gives for each code (c1) the years in which c1 appears in the sample and the corresponding c4 value. For instance:

输出:

x  2006 2007 2008 2009 2010
1  559  559  34   34   - 
4   -   42   42   -    -
11  -   118  118  2    2 

要进入我的细胞阵列,这是我到目前为止使用的代码:

To get to my cell-array, this is the code I used so far:

a1=T_ANNDAT3;
a2=I{:,7};
a3=I{:,6};
a4=I{:,16};
a5=I{:,1};
TRACK_AN = [num2cell([a2 a1 a4 a3]) a5];
TRACK_AN(cell2mat(TRACK_AN(:,1))==0,:)=[];
[~,indTA,~] = unique(strcat(TRACK_AN(:,1),TRACK_AN(:,2),TRACK_AN(:,4),TRACK_AN(:,5)));
TRACK_AN = TRACK_AN(indTA,:);

有人可以帮忙吗?

推荐答案

您可以使用unique轻松地计算出这一点.关键是使用'rows'标志作为unique中的第二个参数,因此您可以找出矩阵的唯一条目.在此过程中,我们仅需要矩阵的第一,第二和第四列,因此我们可以将这些列子集化.您还需要使用unique的其他输出参数,以便我们可以确定唯一行在原始单元格数组中的确切位置.这是算法下一部分所需的关键属性.

You can calculate this very easily using unique as what you have seen. The key is to use the 'rows' flag as the second parameter into unique so you can figure out the unique row entries for the matrix. We only need the first, second and fourth columns of the matrix for this process so we can just subset those columns out. You also need to use the additional output parameters of unique so we can figure out where exactly the unique rows appear in the original cell array. This is the key property we need for the next part of the algorithm.

在第一个unique调用中找到唯一的单元格数组之后,我们将两次应用unique-一次用于c1列,另一次应用于c2列,因此我们可以对ID和年份.我们将使用unique的第三个输出参数,以便我们可以将每一列中的每个唯一编号分配给唯一ID.然后,我们使用accumarray创建您在上方看到的最终矩阵,将给定最终矩阵的第一列用作行,将第二列用作列的情况下,对第四列中的值进行装箱.换句话说:

After you find the unique cell array from the first unique call, we apply unique two more times - One for the column of c1 and one more for the column of c2 so we can index the ID and the year. We will use the third output parameter of unique so that we can assign each unique number within each column into a unique ID. We then use accumarray to create the final matrix that you see above, binning the values in the fourth column given the first column serving as rows and the second column serving as columns for this final matrix. In other words:

%// Create cell array as per your example
C = {1   2006    20060425    559 'IA'
1   2007    20070129    559 'LO'
1   2007    20070826    559 'VC'
1   2008    20080825     34 'VP'
1   2009    20090116     34 'ZO'
4   2007    20070725     42 'OI'
4   2008    20080712     42 'TF'
4   2008    20080428     42 'XU'
11  2007    20070730    118 'AM'
11  2008    20080912    118 'HK'
11  2009    20090318      2 'VT'
11  2010    20100121      2 'ZZ'};

%// Get only those columns that are relevant
%// These are the first, second and fourth columns
Cmat = unique(cell2mat(C(:,[1 2 4])), 'rows');

%// Bin each of the first and second columns
%// Give them a unique ID per unique number    
[~,~,ind] = unique(Cmat(:,1));
[~,~,ind2] = unique(Cmat(:,2));

%// Use accumarray to create your matrix    
%// Edit - Thanks to Amro
%// Any values that are missing replace with NaN
finalMat = accumarray([ind ind2], Cmat(:,3), [], [], NaN);

输出因此是:

finalMat =

559   559    34    34   NaN
NaN    42    42   NaN   NaN
NaN   118   118     2     2

我用NaN替换了那些缺失的值,以表示缺失的值.

I replaced those values that were missing with NaN to signify the missing values.

希望这会有所帮助!

这篇关于根据Matlab中单元格的现有列创建新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆