确定“有多好" Matlab中有相关性吗? [英] determining "how good" a correlation is in matlab?

查看:85
本文介绍了确定“有多好" Matlab中有相关性吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一组数据,并且已经获得了一定的相关性(使用earson的相关系数).我被要求确定相关性的质量",并且这意味着我的主管想知道如果我尝试置换我的有序对的所有y值并比较获得的相关系数,那么相关性是什么.有人知道这样做的好方法吗?有一个matlab函数可以确定与数据的随机排列之间的相关性相比,相关性有多好吗?

I'm working with a set of data and I've obtained a certain correlations (using pearson's correlation coefficient). I've been asked to determine the "quality of the correlation," and by that my supervisor means he wants to see what the correlations would be if I tried permuting all the y values of my ordered pairs, and compared the obtained correlation coefficients. Does anyone know a nice way of doing this? Is there a matlab function that would determine how good a correlation is when compared to a correlation between random permutations of the data?

推荐答案

您可以将一个向量的标签置换N次,并为每次迭代计算相关系数(cc).然后,您可以将这些值的分布与实际相关性进行比较.

You can permute one vector's labels N times and calculate coefficient of correlations (cc) for each iteration. Then you can compare distribution of those values with the real correlation.

类似这样的东西:

%# random data
n = 20;
x = (1:n)';
y = x + randn(n,1)*3;

%# real correlation
cc = corr(x,y);

%# do permutations
n_iter = 100; %# number of permutations
cc_iter = zeros(n_iter,1); %# preallocate the vector
for k = 1:n_iter
    ind = randperm(n); %# vector of random permutations
    cc_iter(k) = corr(x,y(ind));
end

%# calculate statistics
cc_mean = mean(cc_iter);
cc_std = std(cc_iter);
zval = cc - cc_mean ./ cc_std;
%# probability that the real cc belongs to the same distribution as cc from permuted data
pv = 2 * normcdf(-abs(zval),cc_mean,cc_std); 

%# plot
hist(cc_iter,20)
line([cc cc],ylim,'color','r') %# real value

此外,如果计算与[cc pv] = corr(x,y)的相关性,您将获得p值,该值表示您的相关性与无相关性的差异.这个p值是根据向量呈正态分布的假设计算得出的.但是,如果您计算的不是Pearson,而是Spearman或Kendall相关(非参数),则这些p值将来自随机排列的数据:

In addition, if you compute correlation with [cc pv] = corr(x,y), you get p-value of how your correlation is different from no correlation. This p-value is calculated from assumption that your vector distributed normally. However, if you calculate not Pearson, but Spearman or Kendall correlation (non-parametric), those p-values will be from randomly permuted data:

[cc pv] = corr(x,y,'type','Spearman')

这篇关于确定“有多好" Matlab中有相关性吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆