提取所有成对变量的p值和r值 [英] Extract p values and r values for all pairwise variables

查看:79
本文介绍了提取所有成对变量的p值和r值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在多个国家/地区拥有多年的多个变量.我想为每对变量生成一个既包含R ^ 2值又包含P值的数据框.我有点接近,只提供了一个最低限度的工作示例,并对最终产品的外观有所了解,但实际上在实施时遇到了一些困难.如果有人可以帮助,将不胜感激.

I have multiple variables for multiple countries over multiple years. I would like to generate a dataframe containing both an R^2 value and a P value for each pair of variables. I'm somewhat close, have a minimum working example and an idea of what the end product should look like, but am having some difficulties actually implementing it. If anyone could help, that would be most appreciated.

请注意,与使用Hmisc之类的软件包相比,我想更手动地执行此操作,因为这会带来许多其他问题.我也曾四处寻找类似的解决方案,但还没有很多运气.

Please note, I would like to do this more manually than using packages like Hmisc as that has created a number of other issues. I'd had a look around for similar solutions as well, but havent had much luck.

# Code to generate minimum working example (country year pairs).  

library(tidyindexR)
library(tidyverse)
library(dplyr)
library(reshape2)
 
# Function to generate minimum working example data 

simulateCountryData = function(N=200, NEACH = 20, SEED=100){
                            
        variableOne<-rnorm(N,sample(1:100, NEACH),0.5)
        variableOne[variableOne<0]<-0

        variableTwo<-rnorm(N,sample(1:100, NEACH),0.5)
        variableTwo[variableTwo<0]<-0
        
        variableThree<-rnorm(N,sample(1:100, NEACH),0.5)
        variableThree[variableTwo<0]<-0
        
        geocodeNum<-factor(rep(seq(1,N/NEACH),each=NEACH))
        
        year<-rep(seq(2000,2000+NEACH-1,1),N/NEACH)
        
        # Putting it all together
        AllData<-data.frame(geocodeNum,
                            year,
                            variableOne,
                            variableTwo,
                            variableThree)
        
        return(AllData)
}

 
# This runs the function and generates the data 
mySimData = simulateCountryData()

我对如何获取2个手动选择的变量之间的相关性(p值和r值)有一个合理的想法,但是在整个数据集和国家/地区级别(而不是一次全部)上实现它存在一些麻烦

I have a reasonable idea of how to get correlations (both p values and r values) between 2 manually selected variables, but am having some trouble implementing it on the entire dataset and on a country level (rather than all at once).

# Example pvalue 
corrP = cor.test(spreadMySimData$variableOne,spreadMySimData$variableTwo)$p.value
# Examplwe r value
corrEst = cor(spreadMySimData$variableOne,spreadMySimData$variableTwo) 

最后,最终结果应如下所示:

Finally, the end result should look something like this :

myVariables = colnames(spreadMySimData[3:ncol(spreadMySimData)])
myMatrix = expand.grid(myVariables,myVariables)

# I'm having trouble actually trying to get the r values and p values in the dataframe
myMatrix = as.data.frame(myMatrix)
myMatrix$Pval = runif(9,0.01,1) 
myMatrix$Rval = runif(9,0.2,1) 
myMatrix

再次感谢:)

推荐答案

这将为所有唯一对计算r和p.

This will compute r and p for all the unique pairs.

# matrix of unique pairs coded as numeric
mx_combos <- combn(1:length(myVariables), 2)
# list of unique pairs coded as numeric
ls_combos <- split(mx_combos, rep(1:ncol(mx_combos), each = nrow(mx_combos)))
# for each pair in the list, create a 1 x 4 dataframe
ls_rows <- lapply(ls_combos, function(p) {
  # lookup names of variables
  v1 <- myVariables[p[1]]
  v2 <- myVariables[p[2]]
  # perform the cor.test()
  htest <- cor.test(mySimData[[v1]], mySimData[[v2]])
  # record pertinent info in a dataframe
  data.frame(Var1 = v1, 
             Var2 = v2, 
             Pval = htest$p.value, 
             Rval = unname(htest$estimate))
  })
# row bind the list of dataframes
dplyr::bind_rows(ls_rows)

这篇关于提取所有成对变量的p值和r值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆