R:按组计算Pearson相关和R平方 [英] R: Calculating Pearson correlation and R-squared by group

查看:249
本文介绍了R:按组计算Pearson相关和R平方的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想扩展一个问题的答案 R:过滤数据和计算相关性



为了获得一年中每个月的温湿度相关性(1 = 1月),我们将每个月必须做同样的事情(12次)。

  cor(airquality [airquality $ Month == 1,c Temp,Humidity)])

有没有办法自动做每个月?



在我的情况下,我想要测试相关性的30多个组(不是几个月,而不是几个),我只想知道是否有更快的



谢谢!

解决方案

  cor(airquality [airquality $ Month == 1,c(Temp,Humidity)])

给你一个 2 * 2 协方差矩阵而不是一个数字。我打赌你想为每个单个号码,所以使用

  ## cor(Temp,Humidity | Month)
with(airquality,mapply(cor,split(Temp,Month),split(Humidity,Month)))
pre>

,您将获得一个向量。



阅读?mapply ;它们对按组操作非常有用,尽管它们不是唯一的选择。另请阅读?cor ,并比较

  a< ;  -  rnorm(10)
b< - rnorm(10)
cor(a,b)
cor(cbind(a,b))

您在问题中链接的答案是执行类似于 cor(cbind(a,b))






可重现示例



R中的 airquality 数据集不具有 Humidity 列,因此我将使用用于测试:

  ## cor(Temp,Wind | Month)
x < - with(airquality,mapply(cor,split(Temp,Month),split(Wind,Month))

#5 6 7 8 9
#-0.3732760 -0.1210353 - 0.3052355 -0.5076146 -0.5704701

我们得到一个命名矢量,其中 names )给出 Month unname(x)给出相关性。







非常感谢!它工作完美!我试图找出如何获得一个矢量与 R ^ 2 每个相关性,但我不能...任何想法?


cor(x,y)就像拟合一个标准化的线性回归模型:

  coef(lm(scale(y)〜scale(x) -  1))##记得删除截图

这个简单线性回归中的R平方只是斜率的平方。以前,我们有 x 存储每个组的相关性,现在R平方只是 x ^ 2


I am trying to extend the answer of a question R: filtering data and calculating correlation.

To obtain the correlation of temperature and humidity for each month of the year (1 = January), we would have to do the same for each month (12 times).

cor(airquality[airquality$Month == 1, c("Temp", "Humidity")])

Is there any way to do each month automatically?

In my case I have more than 30 groups (not months but species) to which I would like to test for correlations, I just wanted to know if there is a faster way than doing it one by one.

Thank you!

解决方案

cor(airquality[airquality$Month == 1, c("Temp", "Humidity")])

gives you a 2 * 2 covariance matrix rather than a number. I bet you want a single number for each Month, so use

## cor(Temp, Humidity | Month)
with(airquality, mapply(cor, split(Temp, Month), split(Humidity, Month)) )

and you will obtain a vector.

Have a read around ?split and ?mapply; they are very useful for "by group" operations, although they are not the only option. Also read around ?cor, and compare the difference between

a <- rnorm(10)
b <- rnorm(10)
cor(a, b)
cor(cbind(a, b))

The answer you linked in your question is doing something similar to cor(cbind(a, b)).


Reproducible example

The airquality dataset in R does not have Humidity column, so I will use Wind for testing:

## cor(Temp, Wind | Month)
x <- with(airquality, mapply(cor, split(Temp, Month), split(Wind, Month)) )

#         5          6          7          8          9 
#-0.3732760 -0.1210353 -0.3052355 -0.5076146 -0.5704701 

We get a named vector, where names(x) gives Month, and unname(x) gives correlation.


Thank you very much! It worked just perfectly! I was trying to figure out how to obtain a vector with the R^2 for each correlation too, but I can't... Any ideas?

cor(x, y) is like fitting a standardised linear regression model:

coef(lm(scale(y) ~ scale(x) - 1))  ## remember to drop intercept

The R-squared in this simple linear regression is just the square of the slope. Previously we have x storing correlation per group, now R-squared is just x ^ 2.

这篇关于R:按组计算Pearson相关和R平方的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆