突变多个/连续的列（用dplyr或base R） [英] Mutate multiple / consecutive columns (with dplyr or base R)

查看：105 发布时间：2017/7/13 22:00:26 r dplyr

本文介绍了突变多个/连续的列（用dplyr或base R）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试创建代表重复测量的变量的wave。具体来说，我试图创建连续的变量，表示变量1 - 10，11 - 20 ... 91-100的平均值。请注意，...表示wave 3到9的变量，因为避免键入这些是我的目标！

这是一个示例数据框， df ，共有10行和100列：

  mat<  -  matrix （1000,1,10），ncol = 100）
 df<  -  data.frame（mat）
 dim（df）
> 10 100

我使用了 dplyr 所有这些变量都输入的功能 mutate ，但时间紧迫，容易出错。我不能找到一种方法，而不用手动输入列的名称，就像我在下面开始做的那样（注意...表示波3到9）：

  df<  -  df％>％
 mutate（wave_1 =（X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10）/ 10，
 wave_2 =（X11 + X12 + X13 + X14 + X15 + X16 + X17 + X18 + X19 + X20）/ 10，
 ... 
 wave_10 =（X91 + X92 + X93 + X94 + X95 + X96 + X97 + X98 + X99 + X100）/ 10）

您可以 mutate 使用'dplyr'突变多个/连续列吗？

解决方案

这里是一个包 zoo ：

  library（zoo）
t（rollapply（t（df），width = 10，by = 10 ，function（x）sum（x）/ 10））

基数R：

 分割< -  1：100 
 dim（split）<  -  c（10， ）
 split<  -  split（split，col（split））
 results<  -  do.call（cbind，lapply（splits，function（x）data.frame（rowSums（df [ ，x] / 10））））
名称（结果）<  -  paste0（wave_，1:10）
结果

另一个非常简洁的方法与base R（由G.Grothendieck提供）：

  t（apply（df，1，tapply，gl（10，10），mean））

这里是一个解决方案，其中包含 dplyr 和 tidyr ：

库（tidyr）
df $ row< - 1：nrow（df）
df2< - df％％gathe r（column，value，-row）
df2 $ column< - cut（as.numeric（gsub（X，，df2 $ column）），breaks = c（0：10 * 10））
df2< - df2％>％group_by（row，column）％>％summarize（value = sum（value）/ 10）
df2％>％spread（column，value）％>％select（-row）

I'm trying to create "waves" of variables that represent repeated measures. Specifically, I'm trying to create consecutive variables that represent the mean values for variables 1 - 10, 11 - 20 ... 91-100. Note that the "..." symbolizes the variables for waves 3 through 9, as avoiding typing these is my goal!

Here is an example data frame, df, with 10 rows and 100 columns:

mat <- matrix(runif(1000, 1, 10), ncol = 100)
df <- data.frame(mat)
dim(df)
> 10 100

I've used the dplyr function mutate which works once all the variables are typed, but is time-intensive and prone to mistakes. I have not been able to find a way to do so without resorting to manually typing the names of the columns, as I started doing below (note that "..." symbolizes waves 3 through 9):

df <- df %>% 
      mutate(wave_1 = (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10) / 10,
             wave_2 = (X11 + X12 + X13 + X14 + X15 + X16 + X17 + X18 + X19 + X20) / 10,
             ...
             wave_10 = (X91 + X92 + X93 + X94 + X95 + X96 + X97 + X98 + X99 + X100) / 10)

Can you mutate mutate multiple / consecutive columns with 'dplyr'? Other approaches are also welcome.

解决方案

Here is one way with the package zoo:

library(zoo)
t(rollapply(t(df), width = 10, by = 10, function(x) sum(x)/10))

Here is one way to do it with base R:

splits <- 1:100
dim(splits) <- c(10, 10)
splits <- split(splits, col(splits))
results <- do.call("cbind", lapply(splits, function(x) data.frame(rowSums(df[,x] / 10))))
names(results) <- paste0("wave_", 1:10)
results

Another very succinct way with base R (courtesy of G.Grothendieck):

t(apply(df, 1, tapply, gl(10, 10), mean))

And here is a solution with dplyr and tidyr:

library(dplyr)
library(tidyr)
df$row <- 1:nrow(df)
df2 <- df %>% gather(column, value, -row)
df2$column <- cut(as.numeric(gsub("X", "", df2$column)),breaks = c(0:10*10))
df2 <- df2 %>% group_by(row, column) %>% summarise(value = sum(value)/10)
df2 %>% spread(column, value) %>% select(-row)

这篇关于突变多个/连续的列（用dplyr或base R）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

突变多个/连续的列（用dplyr或base R） [英] Mutate multiple / consecutive columns (with dplyr or base R)

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

突变多个/连续的列（用dplyr或base R） [英] Mutate multiple / consecutive columns (with dplyr or base R)

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭