在R中以编程方式分配的效率 [英] Efficiency in assigning programmatically in R

查看：67 发布时间：2020/9/13 4:13:42 r assign

本文介绍了在R中以编程方式分配的效率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

总而言之，我有一个脚本，用于导入存储在几个txt文件中的大量数据.在一个 not 单独的文件中，所有行都将放在同一张表中(DF现在切换到DT)，因此对于每个文件，我选择属于同一DF的所有行，get DF并assign到它的行.

In summary, I have a script for importing lots of data stored in several txt files. In a sigle file not all the rows are to be put in the same table (DF now switching to DT), so for each file I select all the rows belonging to the same DF, get DF and assign to it the rows.

我第一次创建一个名称为DF的DF，例如table1:

The first time I create a DF named ,say, table1 I do:

name <- "table1" # in my code the value of name will depend on different factors
                 # and **not** known in advance
assign(name, someRows)

然后，在执行期间，我的代码可能会(在其他文件中)找到要放入table1数据框中的其他行，因此:

Then, during the execution my code may find (in other files) other lines to be put in the table1 data frame, so:

name <- "table"
assign(name, rbindfill(get(name), someRows))

我的问题是:assign(get(string), anyObject)是通过编程进行分配的最佳方法吗?谢谢

My question is: is assign(get(string), anyObject) the best way for doing assignment programmatically? Thanks

这是我的代码的简化版本:(dataSource中的每个项目都是read.table()的结果，所以只有一个文本文件)

here is a simplified version of my code: (each item in dataSource is the result of read.table() so one single text file)

set.seed(1)
#
dataSource <- list(data.frame(fileType = rep(letters[1:2], each=4),
                              id       = rep(LETTERS[1:4], each=2),
                              var1     = as.integer(rnorm(8))),
                   data.frame(fileType = rep(letters[1:2], each=4),
                              id       = rep(LETTERS[1:4], each=2),
                              var1     = as.integer(rnorm(8))))
#                   #                                                                                          #
#                          
library(plyr)
#
tablesnames <- unique(unlist(lapply(dataSource,function(x) as.character(unique(x[,1])))))
for(l in tablesnames){
  temp <- lapply(dataSource, function(x) x[x[,1]==l, -1])
  if(exists(l)) assign(l, rbind.fill(get(l), rbind.fill(temp))) else assign(l, rbind.fill(temp))
}
#
#            
# now two data frames a and b are crated
#
#
# different method using rbindlist in place of rbind.fill (faster and, until now, I don't # have missing column to fill)
#
rm(a,b)
library(data.table)
#
tablesnames <- unique(unlist(lapply(dataSource,function(x) as.character(unique(x[,1])))))
for(l in tablesnames){
  temp <- lapply(dataSource, function(x) x[x[,1]==l, -1])
  if(exists(l)) assign(l, rbindlist(list(get(l), rbindlist(temp)))) else assign(l, rbindlist(temp))
}

推荐答案

我建议使用命名为list的内容，并跳过使用assign和get的内容.许多很酷的R功能(例如lapply)在列表上都可以很好地工作，而不能与使用assign和get一起使用.此外，您可以轻松地将列表传递到函数中，而将变量组与assign和get结合使用可能会有些麻烦.

I would recommend using a named list, and skip using assign and get. Many of the cool R features (lapply for example) work very well on lists, and do not work with using assign and get. In addition, you can easily pass lists in to a function, while this can be somewhat cumbersome with groups of variables combined with assign and get.

如果您想将一组文件读入一个大data.frame中，我会使用类似的东西(假设csv像文本文件一样):

If you want to read a set of files into one big data.frame I'd use something like this (assuming csv like text files):

library(plyr)
list_of_files = list.files(pattern = "*.csv")
big_dataframe = ldply(list_of_files, read.csv)

或者如果您想将结果保存在列表中:

or if you want to keep the result in a list:

big_list = lapply(list_of_files, read.csv)

，并可能使用rbind.fill:

big_dataframe = do.call("rbind.fill", big_list)

这篇关于在R中以编程方式分配的效率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R中以编程方式分配的效率 [英] Efficiency in assigning programmatically in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中以编程方式分配的效率 [英] Efficiency in assigning programmatically in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭