R:如何应用输出多列数据框的函数(使用dplyr)? [英] R: How to apply a function that outputs a dataframe for multiple columns (using dplyr)?

查看:202
本文介绍了R:如何应用输出多列数据框的函数(使用dplyr)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在数据帧中找到一个特定列和所有其他列之间的相关性,p值和95%CI。 '扫帚'包提供了一个使用corply和dplyr和pipe之间的两列之间的例子。对于mtcars,例如mpg列,我们可以与另一列运行关联:

I want to find correlations, p-values and 95% CI between one specific column and all other columns in a dataframe. The 'broom' package provides an example how to do that between two columns using cor.test with dplyr and pipes. For mtcars and, say, mpg column we can run a correlation with another column:

library(dplyr)
library(broom)
mtcars %>% do(tidy(cor.test(.$mpg, .$cyl)))

estimate statistic      p.value parameter   conf.low  conf.high
1 -0.852162 -8.919699 6.112687e-10        30 -0.9257694 -0.7163171

输出是单行数据帧。我想为每个列运行corp.test for mpg,并将输出发送到一个单独的行。当mpg列与其他列配对时,所需的输出将如下所示:

The output is a single-row dataframe. I'd like to run cor.test for mpg with each column and send the output to a separate row. When mpg column is paired with every other column, the desired output would look like this:

    estimate statistic      p.value parameter   conf.low     conf.high
cyl  -0.852162  -8.919699 6.112687e-10       30 -0.9257694 -0.7163171
disp -0.8475514 -8.747152 9.380327e-10       30 -0.9233594 -0.7081376
hp   -0.7761684 -6.742389 1.787835e-07       30 -0.8852686 -0.5860994
drat  0.6811719  5.096042 1.77624e-05        30 0.4360484  0.832201
wt   -0.8676594 -9.559044 1.293959e-10       30 -0.9338264 -0.7440872
qsec  0.418684   2.525213 0.01708199         30 0.08195487 0.6696186
vs    0.6640389  4.864385 3.415937e-05       30 0.410363 0.8223262
am    0.5998324  4.106127 0.0002850207       30 0.3175583  0.784452
gear  0.4802848  2.999191 0.005400948        30 0.1580618 0.7100628
carb -0.5509251  -3.61575 0.001084446        30 -0.754648 -0.2503183

请注意添加的行名称在第一个列。他们显示哪个列与mpg配对用于cor.test。理想情况下,我想用dplyr和管道来做。

Note the added row names in the first column. They show which column was paired with mpg for the cor.test. Ideally, I'd like to do this with dplyr and pipes.

推荐答案

这里有一个解决方案,坚持使用 do 方法。您缺少的步骤是收集您的数据,然后按变量分组。

Here's a solution that sticks with the do approach. The step you're missing is to gather your data and then group by the variable.

library(dplyr)
library(tidyr)
library(broom)

mtcars %>%
  gather(var, value, -mpg) %>%
  group_by(var) %>%
  do(tidy(cor.test(.$mpg, .$value))) %>%
  ungroup() %>%
  mutate(var = factor(var, names(mtcars)[-1])) %>%
  arrange(var)


$ b $这是一个例子,它更多的是基于R的方法(尽管我为了方便使用了管道,但是它很容易适应)

And here's an example that's more along the base R approach (though I used pipes for convenience, but it's easily adaptable)

library(dplyr)
library(broom)

xvar <- "mpg"
yvar <- names(mtcars)[!names(mtcars) %in% xvar]

lapply(yvar,
       function(yvar, xvar, DF)
       {
         cor.test(DF[[xvar]], DF[[yvar]]) %>%
           tidy()
       },
       xvar,
       mtcars) %>%
  bind_rows() %>%
  mutate(yvar = yvar) %>%
  select(yvar, estimate:conf.high)

这篇关于R:如何应用输出多列数据框的函数(使用dplyr)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆