在导入数据框的同时过滤多个csv文件 [英] Filtering multiple csv files while importing into data frame

查看：195 发布时间：2017/11/8 19:51:08 r csv import filter

本文介绍了在导入数据框的同时过滤多个csv文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想读取大量的csv文件。csvs中的所有列标题都是相同的。但是我想只将那些从每个文件中的行导入到变量在给定范围内（高于最小阈值和低于最大阈值）的数据帧，例如

  v1 v2 v3 
 1 xq 2 
 2 cw 4 
 3 ve 5 
 4 br 7 
  2&v3< 7）的过滤应该导致： 
 
  
 
 
  v1 v2 v3 
 1 cw 4 
 2 ve 5 
  
所以我把所有csvs的数据导入到一个数据框中，然后进行过滤： 
 
 
 ＃读取数据文件
 fileNames<  -  list.files（path = workDir）
 mergedFiles<  -  do.call（rbind，sapply（fileNames，read.csv，simplify = FALSE） ）
 fileID<  -  row.names（mergedFiles）$ b $ fileID<  -  gsub（.csv。*，，fileID）
＃数据与文件ID的组合
 combFiles = cbind（fileID，mergedFiles）
＃根据条件筛选数据
 resultFi le<  -  combFiles [combFiles $ v3>分& combFiles $ v3< [b 
 
 
 $ b 我宁愿在将每个单独的csv文件导入到数据框中时应用过滤器。我假设一个for循环将是最好的方式，但我不知道如何。 
我将不胜感激任何建议。
 
 $ p $ 编辑
  
在测试了mnel的建议之后，我得到了一个不同的解决方案：
 
 
 （我在1：长度（fileNames））{
 tempData = read.csv（$）中的 fileNames = list.files（path = workDir）
 mzList = （fileNames [i]）
 mz.idx = which（tempData [，1]> minMZ& tempData [，1]< maxMZ）
 mz1 = tempData [mz.idx，] 
 mzList [[i]] = data.frame（mz1，filename = rep（fileNames [i]，length（mz.idx）））
} 
 resultFile = do.call（rbind ，mzList）
  
感谢所有的建议！
 
 data.table 的方法，它可以让你使用 fread （这是比 read.csv  更快）和 rbindlist 这是超快执行 do.call（rbind，list（..）） 完美适用于这种情况。它还有一个函数  
  library（data.table）
 fileNames<  -  list.files（path = workDir）
 alldata<  -  rbindlist（lapply（fileNames，function（x，mon，max）{
 xx<  -  fread（x， （v3，lower = min，upper = max，incbounds）之间的差距（单位：美元） = FALSE）] 
}，min = 2，max = 3））
  个别文件很大，并且 v1 总是整数值，所以可能需要设置 v3 作为键，然后使用二进制搜索，导入所有内容然后运行过滤也可能会更快。  
I have a large number of csv files that I want to read into R. All the column headings in the csvs are the same. But I want to import only those rows from each file into the data frame  for which a variable is within a given range (above min threshold & below max threshold), e.g.
   v1   v2   v3
1  x    q    2
2  c    w    4
3  v    e    5
4  b    r    7
Filtering for v3 (v3>2 & v3<7) should results in:
   v1   v2   v3
1  c    w    4
2  v    e    5
So fare I import all the data from all csvs into one data frame and then do the filtering:
#Read the data files
fileNames <- list.files(path = workDir)
mergedFiles <- do.call("rbind", sapply(fileNames, read.csv, simplify = FALSE))
fileID <- row.names(mergedFiles)
fileID <- gsub(".csv.*", "", fileID)
#Combining data with file IDs
combFiles=cbind(fileID, mergedFiles)
#Filtering the data according to criteria
resultFile <- combFiles[combFiles$v3 > min & combFiles$v3 < max, ]
I would rather apply the filter while importing each single csv file into the data frame. I assume a for loop would be the best way of doing it, but I am not sure how.
I would appreciate any suggestion.
Edit
After testing the suggestion from mnel, which worked, I ended up with a different solution:
fileNames = list.files(path = workDir)
mzList = list()
for(i in 1:length(fileNames)){
tempData = read.csv(fileNames[i])
mz.idx = which(tempData[ ,1] > minMZ & tempData[ ,1] < maxMZ)
mz1 = tempData[mz.idx, ]
mzList[[i]] = data.frame(mz1, filename = rep(fileNames[i], length(mz.idx)))
}
resultFile = do.call("rbind", mzList)
Thanks for all the suggestions!
 解决方案 
Here is an approach using data.table which will allow you to use fread (which is faster than read.csv) and rbindlist which is a superfast implementation of do.call(rbind, list(..)) perfect for this situation. It also has a function between 
library(data.table)
fileNames <- list.files(path = workDir)
alldata <- rbindlist(lapply(fileNames, function(x,mon,max) {
  xx <- fread(x, sep = ',')
  xx[, fileID :=   gsub(".csv.*", "", x)]
  xx[between(v3, lower=min, upper = max, incbounds = FALSE)]
  }, min = 2, max = 3))
If the individual files are large and v1 always integer values it might be worth setting v3 as a key then using a binary search, it may also be quicker to import everything and then run the filtering. 

                        这篇关于在导入数据框的同时过滤多个csv文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！
                        
                    
                    
                        查看全文

在导入数据框的同时过滤多个csv文件 [英] Filtering multiple csv files while importing into data frame

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在导入数据框的同时过滤多个csv文件 [英] Filtering multiple csv files while importing into data frame

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭