使用rbind()将多个数据帧组合为一个更大的data.frame在lapply()中 [英] Using rbind() to combine multiple data frames into one larger data.frame within lapply()

查看:125
本文介绍了使用rbind()将多个数据帧组合为一个更大的data.frame在lapply()中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R-Studio 0.99.491和R版本3.2.3(2015-12-10).我是R的相对新手,非常感谢您的帮助.我正在做一个项目,试图使用旧媒体服务器上的服务器日志来识别服务器中哪些文件夹/文件仍在访问中,哪些未被访问,以便我的团队知道要迁移哪些文件.每个日志都需要24小时,并且我大约有一年的日志价值,因此从理论上讲,我应该能够看到过去一年中的所有访问权限.

I'm using R-Studio 0.99.491 and R version 3.2.3 (2015-12-10). I'm a relative newbie to R, and I'd appreciate some help. I'm doing a project where I'm trying to use the server logs on an old media server to identify which folders/files within the server are still being accessed and which aren't, so that my team knows which files to migrate. Each log is for a 24 hour period, and I have approximately a year's worth of logs, so in theory, I should be able to see all of the access over the past year.

我的理想输出是得到一个树形结构或图形,该结构或图形将向我显示服务器上正在使用的文件夹.我已经弄清楚了如何将一个日志(一天)作为data.frame读入R,然后使用R中的data.tree包将其变成一棵树.现在,我想递归地遍历目录中的所有文件,然后在创建树之前将它们添加到原始data.frame中.这是我当前的代码:

My ideal output is to get a tree structure or plot that will show me the folders on our server that are being used. I've figured out how to read one log (one day) into R as a data.frame and then use the data.tree package in R to turn that into a tree. Now, I want to recursively go through all of the files in the directory, one by one, and add them to that original data.frame, before I create the tree. Here's my current code:

#Create the list of log files in the folder
files <- list.files(pattern = "*.log", full.names = TRUE, recursive = FALSE)
#Create a new data.frame to hold the aggregated log data
uridata <- data.frame()
#My function to go through each file, one by one, and add it to the 'uridata' df, above
lapply(files, function(x){
    uriraw <- read.table(x, skip = 3, header = TRUE, stringsAsFactors = FALSE)
    #print(nrow(uriraw)
    uridata <- rbind(uridata, uriraw)
    #print(nrow(uridata))
})

问题是,无论我如何尝试,lapply循环中的'uridata'值似乎都不会在lapply循环之外保存/传递,但是每次循环运行时都会被覆盖.因此,我没有获取一个大的data.frame,而是仅获取了最后一个"uriraw"文件的内容. (这就是为什么在循环中有这两个注释的打印命令的原因;我正在测试每次循环运行时数据帧中有多少行.)

The problem is that, no matter what I try, the value of 'uridata' within the lapply loop seems to not be saved/passed outside of the lapply loop, but is somehow being overwritten each time the loop runs. So instead of getting one big data.frame, I just get the contents of the last 'uriraw' file. (That's why there are those two commented print commands inside the loop; I was testing how many lines there were in the data frames each time the loop ran.)

任何人都可以澄清我在做什么吗?同样,我想在末尾添加一个大data.frame,它将文件夹中每个(当前为七个)日志文件的内容组合在一起.

Can anyone clarify what I'm doing wrong? Again, I'd like one big data.frame at the end that combines the contents of each of the (currently seven) log files in the folder.

推荐答案

do.call()是你的朋友.

big.list.of.data.frames <- lapply(files, function(x){
    read.table(x, skip = 3, header = TRUE, stringsAsFactors = FALSE)
})

或更简洁(但不太难理解):

or more concisely (but less-tinkerable):

big.list.of.data.frames <- lapply(files, read.table, 
                                  skip = 3,header = TRUE,
                                  stringsAsFactors = FALSE)

然后:

big.data.frame <- do.call(rbind,big.list.of.data.frames)

这是一种推荐的处理方式,因为在R中动态增长"数据帧很痛苦.速度慢且内存昂贵,因为每次迭代都会建立一个新的框架.

This is a recommended way to do things because "growing" a data frame dynamically in R is painful. Slow and memory-expensive, because a new frame gets built at each iteration.

这篇关于使用rbind()将多个数据帧组合为一个更大的data.frame在lapply()中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆