R:按组计算Pearson相关和R平方 [英] R: Calculating Pearson correlation and R-squared by group
问题描述
我想扩展一个问题的答案 R:过滤数据和计算相关性。
为了获得一年中每个月的温湿度相关性(1 = 1月),我们将每个月必须做同样的事情(12次)。
cor(airquality [airquality $ Month == 1,c Temp,Humidity)])
有没有办法自动做每个月?
在我的情况下,我想要测试相关性的30多个组(不是几个月,而不是几个),我只想知道是否有更快的
谢谢!
cor(airquality [airquality $ Month == 1,c(Temp,Humidity)])
给你一个 2 * 2
协方差矩阵而不是一个数字。我打赌你想为每个月
单个号码,所以使用
## cor(Temp,Humidity | Month)
pre>
with(airquality,mapply(cor,split(Temp,Month),split(Humidity,Month)))
,您将获得一个向量。
阅读
?
和?mapply
;它们对按组操作非常有用,尽管它们不是唯一的选择。另请阅读?cor
,并比较a< ; - rnorm(10)
b< - rnorm(10)
cor(a,b)
cor(cbind(a,b))
您在问题中链接的答案是执行类似于
cor(cbind(a,b))
。
可重现示例
R中的
airquality
数据集不具有Humidity
列,因此我将使用风
用于测试:## cor(Temp,Wind | Month)
x < - with(airquality,mapply(cor,split(Temp,Month),split(Wind,Month))
#5 6 7 8 9
#-0.3732760 -0.1210353 - 0.3052355 -0.5076146 -0.5704701
我们得到一个命名矢量,其中
names )
给出Month
和unname(x)
给出相关性。
非常感谢!它工作完美!我试图找出如何获得一个矢量与
R ^ 2
每个相关性,但我不能...任何想法?
cor(x,y)
就像拟合一个标准化的线性回归模型:coef(lm(scale(y)〜scale(x) - 1))##记得删除截图
这个简单线性回归中的R平方只是斜率的平方。以前,我们有
x
存储每个组的相关性,现在R平方只是x ^ 2
。I am trying to extend the answer of a question R: filtering data and calculating correlation.
To obtain the correlation of temperature and humidity for each month of the year (1 = January), we would have to do the same for each month (12 times).
cor(airquality[airquality$Month == 1, c("Temp", "Humidity")])
Is there any way to do each month automatically?
In my case I have more than 30 groups (not months but species) to which I would like to test for correlations, I just wanted to know if there is a faster way than doing it one by one.
Thank you!
解决方案cor(airquality[airquality$Month == 1, c("Temp", "Humidity")])
gives you a
2 * 2
covariance matrix rather than a number. I bet you want a single number for eachMonth
, so use## cor(Temp, Humidity | Month) with(airquality, mapply(cor, split(Temp, Month), split(Humidity, Month)) )
and you will obtain a vector.
Have a read around
?split
and?mapply
; they are very useful for "by group" operations, although they are not the only option. Also read around?cor
, and compare the difference betweena <- rnorm(10) b <- rnorm(10) cor(a, b) cor(cbind(a, b))
The answer you linked in your question is doing something similar to
cor(cbind(a, b))
.
Reproducible example
The
airquality
dataset in R does not haveHumidity
column, so I will useWind
for testing:## cor(Temp, Wind | Month) x <- with(airquality, mapply(cor, split(Temp, Month), split(Wind, Month)) ) # 5 6 7 8 9 #-0.3732760 -0.1210353 -0.3052355 -0.5076146 -0.5704701
We get a named vector, where
names(x)
givesMonth
, andunname(x)
gives correlation.
Thank you very much! It worked just perfectly! I was trying to figure out how to obtain a vector with the
R^2
for each correlation too, but I can't... Any ideas?
cor(x, y)
is like fitting a standardised linear regression model:coef(lm(scale(y) ~ scale(x) - 1)) ## remember to drop intercept
The R-squared in this simple linear regression is just the square of the slope. Previously we have
x
storing correlation per group, now R-squared is justx ^ 2
.这篇关于R:按组计算Pearson相关和R平方的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!