突变多个/连续的列(用dplyr或base R) [英] Mutate multiple / consecutive columns (with dplyr or base R)
问题描述
我正在尝试创建代表重复测量的变量的wave。具体来说,我试图创建连续的变量,表示变量1 - 10,11 - 20 ... 91-100的平均值。请注意,...表示wave 3到9的变量,因为避免键入这些是我的目标!
这是一个示例数据框, df
,共有10行和100列:
mat< - matrix (1000,1,10),ncol = 100)
df< - data.frame(mat)
dim(df)
> 10 100
我使用了 dplyr
所有这些变量都输入的功能 mutate
,但时间紧迫,容易出错。我不能找到一种方法,而不用手动输入列的名称,就像我在下面开始做的那样(注意...表示波3到9):
df< - df%>%
mutate(wave_1 =(X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10)/ 10,
wave_2 =(X11 + X12 + X13 + X14 + X15 + X16 + X17 + X18 + X19 + X20)/ 10,
...
wave_10 =(X91 + X92 + X93 + X94 + X95 + X96 + X97 + X98 + X99 + X100)/ 10)
您可以 mutate
使用'dplyr'突变多个/连续列吗?
这里是一个包 zoo
:
library(zoo)
t(rollapply(t(df),width = 10,by = 10 ,function(x)sum(x)/ 10))
基数R:
分割< - 1:100
dim(split)< - c(10, )
split< - split(split,col(split))
results< - do.call(cbind,lapply(splits,function(x)data.frame(rowSums(df [ ,x] / 10))))
名称(结果)< - paste0(wave_,1:10)
结果
另一个非常简洁的方法与base R(由G.Grothendieck提供):
t(apply(df,1,tapply,gl(10,10),mean))
这里是一个解决方案,其中包含 dplyr
和 tidyr
:
库(tidyr)
df $ row< - 1:nrow(df)
df2< - df% %gathe r(column,value,-row)
df2 $ column< - cut(as.numeric(gsub(X,,df2 $ column)),breaks = c(0:10 * 10) )
df2< - df2%>%group_by(row,column)%>%summarize(value = sum(value)/ 10)
df2%>%spread(column,value) %>%select(-row)
I'm trying to create "waves" of variables that represent repeated measures. Specifically, I'm trying to create consecutive variables that represent the mean values for variables 1 - 10, 11 - 20 ... 91-100. Note that the "..." symbolizes the variables for waves 3 through 9, as avoiding typing these is my goal!
Here is an example data frame, df
, with 10 rows and 100 columns:
mat <- matrix(runif(1000, 1, 10), ncol = 100)
df <- data.frame(mat)
dim(df)
> 10 100
I've used the dplyr
function mutate
which works once all the variables are typed, but is time-intensive and prone to mistakes. I have not been able to find a way to do so without resorting to manually typing the names of the columns, as I started doing below (note that "..." symbolizes waves 3 through 9):
df <- df %>%
mutate(wave_1 = (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10) / 10,
wave_2 = (X11 + X12 + X13 + X14 + X15 + X16 + X17 + X18 + X19 + X20) / 10,
...
wave_10 = (X91 + X92 + X93 + X94 + X95 + X96 + X97 + X98 + X99 + X100) / 10)
Can you mutate
mutate multiple / consecutive columns with 'dplyr'? Other approaches are also welcome.
Here is one way with the package zoo
:
library(zoo)
t(rollapply(t(df), width = 10, by = 10, function(x) sum(x)/10))
Here is one way to do it with base R:
splits <- 1:100
dim(splits) <- c(10, 10)
splits <- split(splits, col(splits))
results <- do.call("cbind", lapply(splits, function(x) data.frame(rowSums(df[,x] / 10))))
names(results) <- paste0("wave_", 1:10)
results
Another very succinct way with base R (courtesy of G.Grothendieck):
t(apply(df, 1, tapply, gl(10, 10), mean))
And here is a solution with dplyr
and tidyr
:
library(dplyr)
library(tidyr)
df$row <- 1:nrow(df)
df2 <- df %>% gather(column, value, -row)
df2$column <- cut(as.numeric(gsub("X", "", df2$column)),breaks = c(0:10*10))
df2 <- df2 %>% group_by(row, column) %>% summarise(value = sum(value)/10)
df2 %>% spread(column, value) %>% select(-row)
这篇关于突变多个/连续的列(用dplyr或base R)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!