如何跳过在R中的制表符分隔文件之前的额外行 [英] How to skip extra lines before the header of a tab delimited delimited file in R

查看：573 发布时间：2017/2/24 20:23:44 r csv delimiter

本文介绍了如何跳过在R中的制表符分隔文件之前的额外行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用的软件生成具有可变数量的摘要信息行的日志文件，后面跟着许多制表符分隔数据。我试图写一个函数，将读取的数据从这些日志文件到一个数据框，忽略了摘要信息。摘要信息从不包含制表符，所以下面的函数工作：

The software I am using produces log files with a variable number of lines of summary information followed by lots of tab delimited data. I am trying to write a function that will read the data from these log files into a data frame ignoring the summary information. The summary information never contains a tab, so the following function works:

read.parameters <- function(file.name, ...){
  lines <- scan(file.name, what="character", sep="\n")
  first.line <- min(grep("\\t", lines))
  return(read.delim(file.name, skip=first.line-1, ...))
}

但是，这些日志文件相当大，因此读取文件两次是非常慢的。当然有更好的方法吗？

However, these logfiles are quite big, and so reading the file twice is very slow. Surely there is a better way?

编辑以添加：

Marek使用 textConnection 对象。他在答案中建议的方式在一个大文件中失败，但是以下工作：

Marek suggested using a textConnection object. The way he suggested in the answer fails on a big file, but the following works:

read.parameters <- function(file.name, ...){
  conn = file(file.name, "r")
  on.exit(close(conn))
  repeat{
    line = readLines(conn, 1)
    if (length(grep("\\t", line))) {
      pushBack(line, conn)
      break}}
  df <- read.delim(conn, ...)
  return(df)}

再次编辑：感谢Marek进一步改进上述功能。

Edited again: Thanks Marek for further improvement to the above function.

推荐答案

t需要读取两次。在第一个结果上使用 textConnection 。

You don't need to read twice. Use textConnection on first result.

read.parameters <- function(file.name, ...){
  lines <- scan(file.name, what="character", sep="\n") # you got "tmp.log" here, i suppose file.name should be
  first.line <- min(grep("\\t", lines))
  return(read.delim(textConnection(lines), skip=first.line-1, ...))
}

这篇关于如何跳过在R中的制表符分隔文件之前的额外行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何跳过在R中的制表符分隔文件之前的额外行 [英] How to skip extra lines before the header of a tab delimited delimited file in R

问题描述

推荐答案

相关文章

Office最新文章

热门教程

热门工具

登录关闭

如何跳过在R中的制表符分隔文件之前的额外行 [英] How to skip extra lines before the header of a tab delimited delimited file in R

问题描述

推荐答案

相关文章

Office最新文章

热门教程

热门工具

登录 关闭

登录关闭