在Matlab中嵌套双重排序 [英] Nested double sort in Matlab
问题描述
假设我有3个向量,向量A
是(n x 1)
,向量B
是(n x 1)
和向量C
是(n x 1)
.
Suppose I have 3 vectors, vector A
which is (n x 1)
, vector B
which is (n x 1)
and vector C
which is (n x 1)
.
我想将A
的元素分为5组,然后在这些组中,我也想将B
的各个元素也分为5组.然后取C
中元素的平均值.所以我将有25个平均值.
I want to sort the elements of A
, into 5 groups, and then within those groups I want to sort the respective elements of B
into 5 groups as well. And then take the average of the elements in C
. So I will have 25 averages.
换句话说:
- 将
A
的元素排序为5个五分位数; - 选择第一个
A
中的一组元素,获取B
中的相应值; - 将选择的
B
元素分为5组. - 取
C
中每个组的平均值. - 选择
A
中的第二组元素,得到对应的B
; 中的值
- 将选择的
B
元素分为5组. - 取
C
中每个组的平均值. - 依此类推.
- Sort the elements of
A
into 5 quintiles; - Pick the first
group of elements in
A
, get the corresponding values inB
; - Sort the picked elements of
B
into 5 groups. - Take the average of each group from
C
. - Pick the second group of elements in
A
, get the corresponding values inB
; - Sort the picked elements of
B
into 5 groups. - Take the average of each group from
C
. - And so on and so forth.
这是我的虚拟代码:
minimum = 50;
maximum = 100;
A = (maximum-minimum).*rand(1000,1) + minimum;
B = (maximum-minimum).*rand(1000,1) + minimum;
C = (maximum-minimum).*rand(1000,1) + minimum;
nbins1 = 5;
nbins2 = 5;
bins1 = ceil(nbins1 * tiedrank(A) / length(A));
for i=1:nbins1
B1 = B(bins1==i);
C1 = C(bins1==i);
bins2 = ceil(nbins1 * tiedrank(B1) / length(B1));
for j=1:nbins2
C2 = C1(bins2==j);
output(i,j) = mean(C2);
clearvars C2
end
clearvars B1 C1
end
问题在于,这似乎一点都不优雅或高效.还有其他方法吗?对于金融界人士来说,这个问题类似于Fama-French(1993)对投资组合的双重排序.
The issue is that, this does not seem very elegant or efficient at all. Is there any other way of doing this? For people in Finance, this problem is analogous to the Fama-French (1993) double sorting of portfolios.
推荐答案
首先,按A列对所有内容进行排序:
First of all, sort everything by column A:
sortedByA = sortrows([A,B,C], 1);
创建一个虚拟向量,表示A中每个组的索引(从1
到nbins1
):
Create a dummy vector representing indices of each group in A (from 1
to nbins1
):
groupsA = repmat(1:nbins1, 1000/nbins1, 1); groupsA = groupsA(:);
然后再次重新排序(按前两列),但是用组索引替换实际的列A,这实际上将对A中每组值中的B进行排序:
Then re-sort again (by first two columns), but replacing actual column A with group indices, which would in effect sort B within each group of values in A:
sorted = sortrows([groupsA, sortedByA(:,[2,3])], [1,2]);
为C列中的组创建索引(从1
到nbins1*nbins2
):
Create indices for groups in column C (from 1
to nbins1*nbins2
):
groupsC = repmat(1:(nbins1*nbins2), 1000/(nbins1*nbins2), 1); groupsC = groupsC(:);
最后,计算每组中的均值:
Finally, compute mean within each group:
averages = accumarray(groupsC, sorted(:,3), [], @mean);
这篇关于在Matlab中嵌套双重排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!