如何跳过在R中的制表符分隔文件之前的额外行 [英] How to skip extra lines before the header of a tab delimited delimited file in R
问题描述
我使用的软件生成具有可变数量的摘要信息行的日志文件,后面跟着许多制表符分隔数据。我试图写一个函数,将读取的数据从这些日志文件到一个数据框,忽略了摘要信息。摘要信息从不包含制表符,所以下面的函数工作:
The software I am using produces log files with a variable number of lines of summary information followed by lots of tab delimited data. I am trying to write a function that will read the data from these log files into a data frame ignoring the summary information. The summary information never contains a tab, so the following function works:
read.parameters <- function(file.name, ...){
lines <- scan(file.name, what="character", sep="\n")
first.line <- min(grep("\\t", lines))
return(read.delim(file.name, skip=first.line-1, ...))
}
但是,这些日志文件相当大,因此读取文件两次是非常慢的。当然有更好的方法吗?
However, these logfiles are quite big, and so reading the file twice is very slow. Surely there is a better way?
编辑以添加:
Marek使用 textConnection
对象。他在答案中建议的方式在一个大文件中失败,但是以下工作:
Marek suggested using a textConnection
object. The way he suggested in the answer fails on a big file, but the following works:
read.parameters <- function(file.name, ...){
conn = file(file.name, "r")
on.exit(close(conn))
repeat{
line = readLines(conn, 1)
if (length(grep("\\t", line))) {
pushBack(line, conn)
break}}
df <- read.delim(conn, ...)
return(df)}
再次编辑:感谢Marek进一步改进上述功能。
Edited again: Thanks Marek for further improvement to the above function.
推荐答案
t需要读取两次。在第一个结果上使用 textConnection
。
You don't need to read twice. Use textConnection
on first result.
read.parameters <- function(file.name, ...){
lines <- scan(file.name, what="character", sep="\n") # you got "tmp.log" here, i suppose file.name should be
first.line <- min(grep("\\t", lines))
return(read.delim(textConnection(lines), skip=first.line-1, ...))
}
这篇关于如何跳过在R中的制表符分隔文件之前的额外行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!