如何在R中使用相同的循环向量来引用多个数据库? [英] How can I refer to multiple databases using the same looping vector in R?
问题描述
我需要通过组合,聚合和分割数据帧来执行各种操作。这些行动需要连续数年重复(2000年,2001年,2002年等)。但是,我找不到一种基于多年的循环字符串来引用多个数据帧的方法。
一个例子:
我想结合来自同一年的3个数据帧。我当前的代码:
Stake_2000< - combine(A2000,B2000,C2000)
Stake_2001< - combine A2001,B2001,C2001)
Stake_2002< - combine(A2002,B2002,C2002)
Stake_2003< - combine(A2003,B2003,C2003) B2004,C2004)
Stake_2005< - combine(A2005,B2005,C2005)
I想简化一个循环中的变量来代替这些年。但是,我无法让R从适当的数据帧读取。我多次尝试:
名称< - c(2000,2001,2002 $,c(A,n,=),(c B,n,sep =),c(C,n,sep =))
assign(paste(Stake _,n,sep =),Temp)}
或替换组合函数与组合(An,Bn,Cn)或组合(A + n,B + n,C + n)
除了这些操作之外,我还需要从不同的数据库进行聚合和匹配,以及多年的类似问题。例如,将所有的2000替换为随后的几年:
Temp< - aggregate(VarA〜VarB,data = A_2000,FUN =长度)
S_2000 $ VarC < - Temp [match(S_2000 $ ID,Temp $ ID),VarA]
我认为有一些非常直截了当的答案,但我没有找到它。
你可以尝试
library(dplyr)
names< c(2000,2001,2002,2003,2004,2005)
(名称中的n){
Temp< - bind_cols ('A',n)),get(paste0('B',n)),
get(paste0('C',n)))
assign(paste0('Stake_',n )
相同(cbind(A2000,B2000,C2000),Stake_2000)
#[1] TRUE
相同(cbind (A2005,B2005,C2005),Stake_2005)
#[1] TRUE
聚合
,你可以做
lapply(mget (paste0('A',2000:2005)),函数(x)
聚合(V1〜V2,x,FUN =长度))
同样对于 B
和 C
虽然不清楚 S_2000
is。
更新
如果行数不同,可能我们可以使用将
与 stri_list2matrix
从 stringi
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
$($ ){
Temp< - as.data.frame(stri_list2matrix(combine(get(paste((A),n)),
get(paste0('B',n)),get paste0('C',n)))),stringsAsFactors = FALSE)
Temp []< - lapply(Temp,as.numeric)
assign(paste0('Stake_',n) )
}
Stake_2000
#V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
#1 6 19 12 18 1 1 18 5 7 17 9 19 12 18 8
#2 4 5 7 4 11 12 9 1 2 5 4 13 18 5 6
#3 14 16 14 0 15 3 7 13 20 0 4 3 0 0 6
#4 10 16 14 10 2 4 10 6 13 16 4 2 6 8 15
#5 13 5 6 2 4 12 11 0 10 16 9 17 12 7 6
#6 4 8 9 15 25 NA NA NA NA NA NA NA NA NA NA NA
数据
set.seed(24)
list2env(setNames(lapply(1:6,function(i)
as.data.frame(matrix(sample(sample(0:20,5 * 5,replace = TRUE) ,ncol = 5)))
paste0('A',2000:2005)),envir = .GlobalEnv
list2env(setNames(lapply(1:6,function(i)
as.data.frame(matrix(sample(0:20,5 * 5,replace = TRUE),ncol = 5))),
paste0('B',2000:2005)),envir =。 GlobalEnv)
list2env(setNames(lapply(1:6,function(i)
as.data.frame(matrix(sample(sample(0:20,5 * 5,replace = TRUE)),ncol = 5 ))),
paste0('C',2000:2005)),envir = .GlobalEnv
I need to perform a variety of actions by combining, aggregating and splitting data frames. These actions need to be repeated for several years in a row (2000, 2001, 2002 etc.). However, I can't find a way to refer to multiple data frames based on a looping string with the years.
An example: I want to combine 3 data frames from the same year. My current code:
Stake_2000 <- combine(A2000, B2000, C2000)
Stake_2001 <- combine(A2001, B2001, C2001)
Stake_2002 <- combine(A2002, B2002, C2002)
Stake_2003 <- combine(A2003, B2003, C2003)
Stake_2004 <- combine(A2004, B2004, C2004)
Stake_2005 <- combine(A2005, B2005, C2005)
I would like to simplify by replacing the years by a variable in a loop. However, I cannot manage to let R read from the appropriate data frames. I've stranded in multiple attempts:
names <- c("2000", "2001", "2002", "2003", "2004", "2005")
for (n in names)
{Temp <- combine(c("A",n,sep=""), (c"B",n,sep=""), c("C",n,sep=""))
assign(paste("Stake_",n,sep=""), Temp)}
or replace combine function with combine(An, Bn, Cn), or combine(A+n, B+n, C+n)
Besides these actions, I need to do aggregating and matching from different databases with the similar problems of the years. For example replace all the "2000" with subsequent years in a loop:
Temp <- aggregate(VarA~VarB, data=A_2000, FUN=length)
S_2000$VarC <- Temp[match(S_2000$ID, Temp$ID), "VarA"]
I presume there is some pretty straight forward answer to it, but I haven't been able to find it.
You could try
library(dplyr)
names <- c("2000", "2001", "2002", "2003", "2004", "2005")
for(n in names){
Temp <- bind_cols( get(paste0('A', n)), get(paste0('B', n)),
get(paste0('C', n)))
assign(paste0('Stake_', n), Temp)
}
identical(cbind(A2000, B2000, C2000), Stake_2000)
#[1] TRUE
identical(cbind(A2005, B2005, C2005), Stake_2005)
#[1] TRUE
For the aggregate
, you could do
lapply(mget(paste0('A', 2000:2005)), function(x)
aggregate(V1~V2, x, FUN=length))
Similarly for B
and C
though it is not clear what S_2000
is.
Update
If the number of rows are different, may be we can use combine
with stri_list2matrix
from stringi
A2000 <- rbind(A2000, c(4,8, 9 , 15, 25))
library(stringi)
for(n in names){
Temp <- as.data.frame(stri_list2matrix(combine( get(paste0('A', n)),
get(paste0('B', n)), get(paste0('C', n)))), stringsAsFactors=FALSE)
Temp[] <- lapply(Temp, as.numeric)
assign(paste0('Stake_', n), Temp)
}
Stake_2000
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
#1 6 19 12 18 1 1 18 5 7 17 9 19 12 18 8
#2 4 5 7 4 11 12 9 1 2 5 4 13 18 5 6
#3 14 16 14 0 15 3 7 13 20 0 4 3 0 0 6
#4 10 16 14 10 2 4 10 6 13 16 4 2 6 8 15
#5 13 5 6 2 4 12 11 0 10 16 9 17 12 7 6
#6 4 8 9 15 25 NA NA NA NA NA NA NA NA NA NA
data
set.seed(24)
list2env(setNames(lapply(1:6, function(i)
as.data.frame(matrix(sample(0:20, 5*5, replace=TRUE), ncol=5))),
paste0('A', 2000:2005)), envir=.GlobalEnv)
list2env(setNames(lapply(1:6, function(i)
as.data.frame(matrix(sample(0:20, 5*5, replace=TRUE), ncol=5))),
paste0('B', 2000:2005)), envir=.GlobalEnv)
list2env(setNames(lapply(1:6, function(i)
as.data.frame(matrix(sample(0:20, 5*5, replace=TRUE), ncol=5))),
paste0('C', 2000:2005)), envir=.GlobalEnv)
这篇关于如何在R中使用相同的循环向量来引用多个数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!