读取文件列表,应用函数并使用相同的名称重写 [英] Read a list of files, apply function and rewrite with same name

查看:125
本文介绍了读取文件列表,应用函数并使用相同的名称重写的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组csv文件,有重复的条目,我需要删除和重写相同的名称和格式的文件。



这是我做了

  filenames< -list.files(pattern =。csv)
datalist< -lapply (文件名,函数(x){read.csv(file = x,header = F)})
unique.list< - lapply(datalist,unique)
/ pre>

并且我坚持分开列表中的数据框架并重写相同的名称。还有一个更类似的问题,我尝试了几个小时,但不明白程序。

解决方案

a for 循环。嘘,不要告诉任何人我说的。为什么?三个原因...


  1. 您要呼叫 write.csv - 效果,不是它的返回值,即,你想要一个文件写入磁盘。 $ * apply

  2. 主要瓶颈将是磁盘I / O,所以我期望与使用 * apply 循环相比,没有使用 for 循环的性能开销。

  3. * apply 函数将在循环的每次迭代中吞入内存,并且不能保证在所有迭代完成之前将其释放。在 for 循环中,如果要覆盖循环中的对象,则在下一次迭代开始时释放内存。如果你使用大的 csv 文件,这可能是一个优势。我将尝试找到一个链接到一个答案,其中解决了 lapply 无法由于内存问题的问题。

所有你需要的解决方案,因为你的重复数据列表是...

  for(i in 1:length(filenames)){
write.csv(unique.list [[i]],filenames [ ])
}

这是一个答案 ,其中需要代替循环,因为 lapply 等同于内存分配错误。


I have a set of csv files with duplicate entries, which I needed to remove and rewrite the files with same name and format.

Here is what I have done so far,

filenames<-list.files(pattern =".csv") 
datalist <-lapply(filenames, function(x){read.csv(file=x,header=F)})
unique.list <- lapply(datalist,unique)

And I'm stuck with separating the data frames in the list and rewriting with same name. There is a more of a similar question, I tried hours but couldn't understand the proceedings.

解决方案

I'd definitely use a for loop. Shhhhhh, don't tell anyone I said that. Why? Three reasons...

  1. You want to call write.csv for it's side-effect, not it's return value, i.e. you want a file to be written to disk. Use *apply when you want a return value from your function.
  2. The main bottle neck will be disk I/O so I expect no performance overhead using a for loop compared to using an *apply loop.
  3. *apply functions will swallow memory on each iteration of the loop and are not guaranteed to free it up until all iterations have completed. In a for loop the memory is freed up at the start of the next iteration if you are overwriting objects inside the loop. If you are working with big csv files this could be an advantage. I will try and find a link to an answer where for solved a problem that lapply could not due to memory issues.

So all you need for my solution, given your de-duplicated data list is...

for( i in 1:length( filenames ) ){
  write.csv( unique.list[[i]] , filenames[[i]] )
}

Here is an answer where a for loop was required because the lapply equivalent ran into memory allocation errors.

这篇关于读取文件列表,应用函数并使用相同的名称重写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆