R-逐行读取STDIN [英] R - Reading STDIN line by line

查看:107
本文介绍了R-逐行读取STDIN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将大数据表流式传输到R LINE BY LINE中,并且如果当前行具有特定条件(假设第一列> 15),则将该行添加到内存中的数据帧中.我编写了以下代码:

I want to stream a big data table into R LINE BY LINE, and if the current line has a specific condition (lets say the first columns is >15), add the line to a data frame in memory. I have written following code:

count<-1;
Mydata<-NULL;
fin <- FALSE;
while (!fin){
    if (count==1){
        Myrow=read.delim(pipe('cat /dev/stdin'), header=F,sep="\t",nrows=1);
        Mydata<-rbind(Mydata,Myrow);
        count<-count+1;
    }
    else {
        count<-count+1;
        Myrow=read.delim(pipe('cat /dev/stdin'), header=F,sep="\t",nrows=1);
        if (Myrow!=""){
        if (MyCONDITION){
            Mydata<-rbind(Mydata,Myrow);
        }
        }
        else
        {fin<-TRUE}
    }
}
print(Mydata);

但是出现错误数据不可用". 请注意,我的数据很大,我不想一次全部读取并应用我的条件(在这种情况下很容易).

But I get the error "data not available". Please note that my data is big and I don't want to read it all in once and apply my condition (in this case it was easy).

推荐答案

我认为使用像readLines这样的R函数会更明智. readLines仅支持读取指定数量的行,例如1.将其与首先打开file连接相结合,然后反复调用readLines将获得所需的内容.当多次调用readLines时,将从连接中读取下一行n行.在R代码中:

I think it would be wiser to use an R function like readLines. readLines supports only reading a specified number of lines, e.g. 1. Combine that with opening a file connection first, and then calling readLines repeatedly gets you what you want. When calling readLines multiple times, the next n lines are read from the connection. In R code:

stop = FALSE
f = file("/tmp/test.txt", "r")
while(!stop) {
  next_line = readLines(f, n = 1)
  ## Insert some if statement logic here
  if(length(next_line) == 0) {
    stop = TRUE
    close(f)
  }
}

其他评论:

  • R具有将stdin视为文件的内部方法:stdin().我建议您使用此而不是使用pipe('cat /dev/stdin').这可能会使它更健壮,并且肯定会更跨平台.
  • 您可以在开始时初始化Mydata,并继续使用rbind对其进行扩展.如果rbind的行数变大,这将变得很慢.这与以下事实有关:当对象增长时,操作系统需要为其找到新的内存位置,这最终会花费很多时间.更好的方法是预先分配MyData或使用Apply样式循环.
  • R has an internal way of treating stdin as file: stdin(). I suggest you use this instead of using pipe('cat /dev/stdin'). This probably makes it more robust, and definitely more cross-platform.
  • You initialize Mydata at the beginning and keep growing it using rbind. If the number of lines that you rbind becomes larger, this will get really slow. This has to do with the fact that when the object grows, the OS needs to find a new memory location for it, which ends up taking a lot of time. Better is to pre-allocate MyData, or use apply style loops.

这篇关于R-逐行读取STDIN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆