读取文件列表,应用函数并使用相同的名称重写 [英] Read a list of files, apply function and rewrite with same name
问题描述
我有一组csv文件,有重复的条目,我需要删除和重写相同的名称和格式的文件。
这是我做了
filenames< -list.files(pattern =。csv)
/ pre>
datalist< -lapply (文件名,函数(x){read.csv(file = x,header = F)})
unique.list< - lapply(datalist,unique)
并且我坚持分开列表中的数据框架并重写相同的名称。还有一个更类似的问题,我尝试了几个小时,但不明白程序。
解决方案a
for
循环。嘘,不要告诉任何人我说的。为什么?三个原因...
- 您要呼叫
write.csv
- 效果,不是它的返回值,即,你想要一个文件写入磁盘。 $* apply
。
- 主要瓶颈将是磁盘I / O,所以我期望与使用
* apply
循环相比,没有使用for
循环的性能开销。
* apply
函数将在循环的每次迭代中吞入内存,并且不能保证在所有迭代完成之前将其释放。在for
循环中,如果要覆盖循环中的对象,则在下一次迭代开始时释放内存。如果你使用大的csv
文件,这可能是一个优势。我将尝试找到一个链接到一个答案,其中为
解决了lapply
无法由于内存问题的问题。
所有你需要的解决方案,因为你的重复数据列表是...
for(i in 1:length(filenames)){
write.csv(unique.list [[i]],filenames [ ])
}
这是一个答案 ,其中需要
代替
循环,因为lapply
等同于内存分配错误。I have a set of csv files with duplicate entries, which I needed to remove and rewrite the files with same name and format.
Here is what I have done so far,
filenames<-list.files(pattern =".csv") datalist <-lapply(filenames, function(x){read.csv(file=x,header=F)}) unique.list <- lapply(datalist,unique)
And I'm stuck with separating the data frames in the list and rewriting with same name. There is a more of a similar question, I tried hours but couldn't understand the proceedings.
解决方案I'd definitely use a
for
loop. Shhhhhh, don't tell anyone I said that. Why? Three reasons...
- You want to call
write.csv
for it's side-effect, not it's return value, i.e. you want a file to be written to disk. Use*apply
when you want a return value from your function.- The main bottle neck will be disk I/O so I expect no performance overhead using a
for
loop compared to using an*apply
loop.*apply
functions will swallow memory on each iteration of the loop and are not guaranteed to free it up until all iterations have completed. In afor
loop the memory is freed up at the start of the next iteration if you are overwriting objects inside the loop. If you are working with bigcsv
files this could be an advantage. I will try and find a link to an answer wherefor
solved a problem thatlapply
could not due to memory issues.So all you need for my solution, given your de-duplicated data list is...
for( i in 1:length( filenames ) ){ write.csv( unique.list[[i]] , filenames[[i]] ) }
Here is an answer where a
for
loop was required because thelapply
equivalent ran into memory allocation errors.这篇关于读取文件列表,应用函数并使用相同的名称重写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!