根据Matlab中单元格的现有列创建新变量 [英] Create new variable based on existing columns of a cell in Matlab
问题描述
我有一个具有60万行和5列的单元阵列.在下面的示例中,我仅介绍3种不同的代码,期限为5年.输入:
I have a cell-array with 600 000 rows and 5 columns. In the following example I only present 3 different codes and a period of 5 years. Input:
c1 c2 c3 c4 c5
1 2006 20060425 559 'IA'
1 2007 20070129 559 'LO'
1 2007 20070826 559 'VC'
1 2008 20080825 34 'VP'
1 2009 20090116 34 'ZO'
4 2007 20070725 42 'OI'
4 2008 20080712 42 'TF'
4 2008 20080428 42 'XU'
11 2007 20070730 118 'AM'
11 2008 20080912 118 'HK'
11 2009 20090318 2 'VT'
11 2010 20100121 2 'ZZ'
我想获得一个新变量,该变量为每个代码(c1
)提供c1
出现在样本中的年份以及相应的c4
值.例如:
I would like to obtain a new variable that gives for each code (c1
) the years in which c1
appears in the sample and the corresponding c4
value. For instance:
输出:
x 2006 2007 2008 2009 2010
1 559 559 34 34 -
4 - 42 42 - -
11 - 118 118 2 2
要进入我的细胞阵列,这是我到目前为止使用的代码:
To get to my cell-array, this is the code I used so far:
a1=T_ANNDAT3;
a2=I{:,7};
a3=I{:,6};
a4=I{:,16};
a5=I{:,1};
TRACK_AN = [num2cell([a2 a1 a4 a3]) a5];
TRACK_AN(cell2mat(TRACK_AN(:,1))==0,:)=[];
[~,indTA,~] = unique(strcat(TRACK_AN(:,1),TRACK_AN(:,2),TRACK_AN(:,4),TRACK_AN(:,5)));
TRACK_AN = TRACK_AN(indTA,:);
有人可以帮忙吗?
推荐答案
您可以使用unique
轻松地计算出这一点.关键是使用'rows'
标志作为unique
中的第二个参数,因此您可以找出矩阵的唯一行条目.在此过程中,我们仅需要矩阵的第一,第二和第四列,因此我们可以将这些列子集化.您还需要使用unique
的其他输出参数,以便我们可以确定唯一行在原始单元格数组中的确切位置.这是算法下一部分所需的关键属性.
You can calculate this very easily using unique
as what you have seen. The key is to use the 'rows'
flag as the second parameter into unique
so you can figure out the unique row entries for the matrix. We only need the first, second and fourth columns of the matrix for this process so we can just subset those columns out. You also need to use the additional output parameters of unique
so we can figure out where exactly the unique rows appear in the original cell array. This is the key property we need for the next part of the algorithm.
在第一个unique
调用中找到唯一的单元格数组之后,我们将两次应用unique
-一次用于c1
列,另一次应用于c2
列,因此我们可以对ID和年份.我们将使用unique
的第三个输出参数,以便我们可以将每一列中的每个唯一编号分配给唯一ID.然后,我们使用accumarray
创建您在上方看到的最终矩阵,将给定最终矩阵的第一列用作行,将第二列用作列的情况下,对第四列中的值进行装箱.换句话说:
After you find the unique cell array from the first unique
call, we apply unique
two more times - One for the column of c1
and one more for the column of c2
so we can index the ID and the year. We will use the third output parameter of unique
so that we can assign each unique number within each column into a unique ID. We then use accumarray
to create the final matrix that you see above, binning the values in the fourth column given the first column serving as rows and the second column serving as columns for this final matrix. In other words:
%// Create cell array as per your example
C = {1 2006 20060425 559 'IA'
1 2007 20070129 559 'LO'
1 2007 20070826 559 'VC'
1 2008 20080825 34 'VP'
1 2009 20090116 34 'ZO'
4 2007 20070725 42 'OI'
4 2008 20080712 42 'TF'
4 2008 20080428 42 'XU'
11 2007 20070730 118 'AM'
11 2008 20080912 118 'HK'
11 2009 20090318 2 'VT'
11 2010 20100121 2 'ZZ'};
%// Get only those columns that are relevant
%// These are the first, second and fourth columns
Cmat = unique(cell2mat(C(:,[1 2 4])), 'rows');
%// Bin each of the first and second columns
%// Give them a unique ID per unique number
[~,~,ind] = unique(Cmat(:,1));
[~,~,ind2] = unique(Cmat(:,2));
%// Use accumarray to create your matrix
%// Edit - Thanks to Amro
%// Any values that are missing replace with NaN
finalMat = accumarray([ind ind2], Cmat(:,3), [], [], NaN);
输出因此是:
finalMat =
559 559 34 34 NaN
NaN 42 42 NaN NaN
NaN 118 118 2 2
我用NaN
替换了那些缺失的值,以表示缺失的值.
I replaced those values that were missing with NaN
to signify the missing values.
希望这会有所帮助!
这篇关于根据Matlab中单元格的现有列创建新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!