在多个CSV中读取不同的行数,以在文件开头跳过 [英] Reading in multiple CSVs with different numbers of lines to skip at start of file

查看:236
本文介绍了在多个CSV中读取不同的行数,以在文件开头跳过的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须阅读约300个个别CSV。我已经设法使用循环和结构化CSV名称自动化过程。然而,每个CSV在开始时有14-17行垃圾,它随机变化,所以硬编码read.table命令中的skip参数将不工作。

I have to read in about 300 individual CSVs. I have managed to automate the process using a loop and structured CSV names. However each CSV has 14-17 lines of rubbish at the start and it varies randomly so hard coding a 'skip' parameter in the read.table command won't work. The column names and number of columns is the same for each CSV.

下面是一个我的例子:

QUICK STATISTICS:

      Directory: Data,,,,
           File: Final_Comp_Zn_1
      Selection: SEL{Ox*1000+Doma=1201}
         Weight: None,,,
     ,,Variable: AG,,,

Total Number of Samples: 450212  Number of Selected Samples: 277


Statistics

VARIABLE,Min slice Y(m),Max slice Y(m),Count,Minimum,Maximum,Mean,Std.Dev.,Variance,Total Samples in Domain,Active Samples in Domain AG,  
6780.00,   6840.00,         7,    3.0000,   52.5000,   23.4143,   16.8507,  283.9469,        10,        10 AG,   
6840.00,   6900.00,         4,    4.0000,    5.5000,    4.9500,    0.5766,    0.3325,        13,        13 AG,   
6900.00,   6960.00,        16,    1.0000,   37.0000,    8.7625,    9.0047,   81.0848,        29,        29 AG,   
6960.00,   7020.00,        58,    3.0000,   73.5000,   10.6931,   11.9087,  141.8172,       132,       132 AG,   
7020.00,   7080.00,        23,    3.0000,  104.5000,   15.3435,   23.2233,  539.3207,        23,        23 AG,   
7080.00,   7140.00,        33,    1.0000,   15.4000,    3.8152,    2.8441,    8.0892,        35,        35 AG,

基本上我想从< $ c> VARIABLE,Min slice Y(m),Max slice Y(m),... 。我可以想到几个解决方案,但我不知道我会如何编程。是否还有我可以:

Basically I want to read from the line VARIABLE,Min slice Y(m),Max slice Y(m),.... I can think of a few solutions but I don't know how I would go about programming it. Is there anyway I can:


  1. 首先阅读CSV,然后找出多少行垃圾,然后重新读取指定要跳过的行数是否正确?或

  2. 在找到列名称(因为每个CSV都相同)时,请尝试 read.table 开始阅读,一切之前呢?

  1. Read the CSV first and somehow work out how many lines of rubbish there is and then re-read it and specify the correct number of lines to skip? Or
  2. Tell read.table to start reading when it finds the column names (since these are the same for each CSV) and ignore everything prior to that?

我认为解决方案(2)是最合适的,但我对任何建议持开放态度。

I think solution (2) would be the most appropriate, but I am open to any suggestions!

推荐答案

包中的函数 fread data.table 自动检测要跳过的行数。

The function fread from the package data.table does automatic detection of number of rows to be skipped. The function is in development stage currently.

以下是示例代码:

require(data.table)

cat("blah\nblah\nblah\nVARIABLE,X1,X2\nA,1,2\n", file="myfile1.csv")
cat("blah\nVARIABLE,A1,A2\nA,1,2\n", file="myfile2.csv")
cat("blah\nblah\nVARIABLE,Z1,Z2\nA,1,2\n", file="myfile3.csv")

lapply(list.files(pattern = "myfile.*.csv"), fread)

这篇关于在多个CSV中读取不同的行数,以在文件开头跳过的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆