在 R 中以编程方式分配的效率 [英] Efficiency in assigning programmatically in R
问题描述
总而言之,我有一个脚本可以导入存储在多个 txt 文件中的大量数据.在一个单一文件中不是所有的行都被放在同一个表中(DF现在切换到DT),所以对于每个文件我选择属于同一个DF的所有行,get
DF 和 assign
给它行.
In summary, I have a script for importing lots of data stored in several txt files. In a sigle file not all the rows are to be put in the same table (DF now switching to DT), so for each file I select all the rows belonging to the same DF, get
DF and assign
to it the rows.
我第一次创建一个名为 table1 的 DF 时:
The first time I create a DF named ,say, table1 I do:
name <- "table1" # in my code the value of name will depend on different factors
# and **not** known in advance
assign(name, someRows)
然后,在执行过程中,我的代码可能会(在其他文件中)找到要放入 table1 数据框中的其他行,因此:
Then, during the execution my code may find (in other files) other lines to be put in the table1 data frame, so:
name <- "table"
assign(name, rbindfill(get(name), someRows))
我的问题是:assign(get(string), anyObject)
是以编程方式进行赋值的最佳方式吗?谢谢
My question is: is assign(get(string), anyObject)
the best way for doing assignment programmatically? Thanks
这是我的代码的简化版本:(dataSource
中的每一项都是 read.table()
的结果,所以是一个文本文件)
here is a simplified version of my code: (each item in dataSource
is the result of read.table()
so one single text file)
set.seed(1)
#
dataSource <- list(data.frame(fileType = rep(letters[1:2], each=4),
id = rep(LETTERS[1:4], each=2),
var1 = as.integer(rnorm(8))),
data.frame(fileType = rep(letters[1:2], each=4),
id = rep(LETTERS[1:4], each=2),
var1 = as.integer(rnorm(8))))
# # #
#
library(plyr)
#
tablesnames <- unique(unlist(lapply(dataSource,function(x) as.character(unique(x[,1])))))
for(l in tablesnames){
temp <- lapply(dataSource, function(x) x[x[,1]==l, -1])
if(exists(l)) assign(l, rbind.fill(get(l), rbind.fill(temp))) else assign(l, rbind.fill(temp))
}
#
#
# now two data frames a and b are crated
#
#
# different method using rbindlist in place of rbind.fill (faster and, until now, I don't # have missing column to fill)
#
rm(a,b)
library(data.table)
#
tablesnames <- unique(unlist(lapply(dataSource,function(x) as.character(unique(x[,1])))))
for(l in tablesnames){
temp <- lapply(dataSource, function(x) x[x[,1]==l, -1])
if(exists(l)) assign(l, rbindlist(list(get(l), rbindlist(temp)))) else assign(l, rbindlist(temp))
}
推荐答案
我建议使用命名的list
,并跳过使用 assign
和 get代码>.许多很酷的 R 特性(例如
lapply
)在列表上工作得非常好,并且不能与使用 assign
和 get
一起使用.此外,您可以轻松地将列表传递给函数,而这对于将变量组与 assign
和 get
结合使用可能会有些麻烦.
I would recommend using a named list
, and skip using assign
and get
. Many of the cool R features (lapply
for example) work very well on lists, and do not work with using assign
and get
. In addition, you can easily pass lists in to a function, while this can be somewhat cumbersome with groups of variables combined with assign
and get
.
如果你想将一组文件读入一个大的 data.frame 我会使用这样的东西(假设 csv 像文本文件):
If you want to read a set of files into one big data.frame I'd use something like this (assuming csv like text files):
library(plyr)
list_of_files = list.files(pattern = "*.csv")
big_dataframe = ldply(list_of_files, read.csv)
或者如果您想将结果保存在列表中:
or if you want to keep the result in a list:
big_list = lapply(list_of_files, read.csv)
并可能使用 rbind.fill
:
big_dataframe = do.call("rbind.fill", big_list)
这篇关于在 R 中以编程方式分配的效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!