突变多个/连续的列(用dplyr或base R) [英] Mutate multiple / consecutive columns (with dplyr or base R)

查看:105
本文介绍了突变多个/连续的列(用dplyr或base R)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建代表重复测量的变量的wave。具体来说,我试图创建连续的变量,表示变量1 - 10,11 - 20 ... 91-100的平均值。请注意,...表示wave 3到9的变量,因为避免键入这些是我的目标!



这是一个示例数据框, df ,共有10行和100列:

  mat<  -  matrix (1000,1,10),ncol = 100)
df< - data.frame(mat)
dim(df)
> 10 100

我使用了 dplyr 所有这些变量都输入的功能 mutate ,但时间紧迫,容易出错。我不能找到一种方法,而不用手动输入列的名称,就像我在下面开始做的那样(注意...表示波3到9):

  df<  -  df%>%
mutate(wave_1 =(X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10)/ 10,
wave_2 =(X11 + X12 + X13 + X14 + X15 + X16 + X17 + X18 + X19 + X20)/ 10,
...
wave_10 =(X91 + X92 + X93 + X94 + X95 + X96 + X97 + X98 + X99 + X100)/ 10)

您可以 mutate 使用'dplyr'突变多个/连续列吗?

解决方案

这里是一个包 zoo

  library(zoo)
t(rollapply(t(df),width = 10,by = 10 ,function(x)sum(x)/ 10))

基数R:

 分割< -  1:100 
dim(split)< - c(10, )
split< - split(split,col(split))
results< - do.call(cbind,lapply(splits,function(x)data.frame(rowSums(df [ ,x] / 10))))
名称(结果)< - paste0(wave_,1:10)
结果

另一个非常简洁的方法与base R(由G.Grothendieck提供):

  t(apply(df,1,tapply,gl(10,10),mean))

这里是一个解决方案,其中包含 dplyr tidyr




库(tidyr)
df $ row< - 1:nrow(df)
df2< - df% %gathe r(column,value,-row)
df2 $ column< - cut(as.numeric(gsub(X,,df2 $ column)),breaks = c(0:10 * 10) )
df2< - df2%>%group_by(row,column)%>%summarize(value = sum(value)/ 10)
df2%>%spread(column,value) %>%select(-row)


I'm trying to create "waves" of variables that represent repeated measures. Specifically, I'm trying to create consecutive variables that represent the mean values for variables 1 - 10, 11 - 20 ... 91-100. Note that the "..." symbolizes the variables for waves 3 through 9, as avoiding typing these is my goal!

Here is an example data frame, df, with 10 rows and 100 columns:

mat <- matrix(runif(1000, 1, 10), ncol = 100)
df <- data.frame(mat)
dim(df)
> 10 100

I've used the dplyr function mutate which works once all the variables are typed, but is time-intensive and prone to mistakes. I have not been able to find a way to do so without resorting to manually typing the names of the columns, as I started doing below (note that "..." symbolizes waves 3 through 9):

df <- df %>% 
      mutate(wave_1 = (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10) / 10,
             wave_2 = (X11 + X12 + X13 + X14 + X15 + X16 + X17 + X18 + X19 + X20) / 10,
             ...
             wave_10 = (X91 + X92 + X93 + X94 + X95 + X96 + X97 + X98 + X99 + X100) / 10)

Can you mutate mutate multiple / consecutive columns with 'dplyr'? Other approaches are also welcome.

解决方案

Here is one way with the package zoo:

library(zoo)
t(rollapply(t(df), width = 10, by = 10, function(x) sum(x)/10))

Here is one way to do it with base R:

splits <- 1:100
dim(splits) <- c(10, 10)
splits <- split(splits, col(splits))
results <- do.call("cbind", lapply(splits, function(x) data.frame(rowSums(df[,x] / 10))))
names(results) <- paste0("wave_", 1:10)
results

Another very succinct way with base R (courtesy of G.Grothendieck):

t(apply(df, 1, tapply, gl(10, 10), mean))

And here is a solution with dplyr and tidyr:

library(dplyr)
library(tidyr)
df$row <- 1:nrow(df)
df2 <- df %>% gather(column, value, -row)
df2$column <- cut(as.numeric(gsub("X", "", df2$column)),breaks = c(0:10*10))
df2 <- df2 %>% group_by(row, column) %>% summarise(value = sum(value)/10)
df2 %>% spread(column, value) %>% select(-row)

这篇关于突变多个/连续的列(用dplyr或base R)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆