R-逐行读取STDIN [英] R - Reading STDIN line by line
问题描述
我想将大数据表流式传输到R LINE BY LINE中,并且如果当前行具有特定条件(假设第一列> 15),则将该行添加到内存中的数据帧中.我编写了以下代码:
I want to stream a big data table into R LINE BY LINE, and if the current line has a specific condition (lets say the first columns is >15), add the line to a data frame in memory. I have written following code:
count<-1;
Mydata<-NULL;
fin <- FALSE;
while (!fin){
if (count==1){
Myrow=read.delim(pipe('cat /dev/stdin'), header=F,sep="\t",nrows=1);
Mydata<-rbind(Mydata,Myrow);
count<-count+1;
}
else {
count<-count+1;
Myrow=read.delim(pipe('cat /dev/stdin'), header=F,sep="\t",nrows=1);
if (Myrow!=""){
if (MyCONDITION){
Mydata<-rbind(Mydata,Myrow);
}
}
else
{fin<-TRUE}
}
}
print(Mydata);
但是出现错误数据不可用". 请注意,我的数据很大,我不想一次全部读取并应用我的条件(在这种情况下很容易).
But I get the error "data not available". Please note that my data is big and I don't want to read it all in once and apply my condition (in this case it was easy).
推荐答案
我认为使用像readLines
这样的R函数会更明智. readLines
仅支持读取指定数量的行,例如1.将其与首先打开file
连接相结合,然后反复调用readLines
将获得所需的内容.当多次调用readLines
时,将从连接中读取下一行n
行.在R代码中:
I think it would be wiser to use an R function like readLines
. readLines
supports only reading a specified number of lines, e.g. 1. Combine that with opening a file
connection first, and then calling readLines
repeatedly gets you what you want. When calling readLines
multiple times, the next n
lines are read from the connection. In R code:
stop = FALSE
f = file("/tmp/test.txt", "r")
while(!stop) {
next_line = readLines(f, n = 1)
## Insert some if statement logic here
if(length(next_line) == 0) {
stop = TRUE
close(f)
}
}
其他评论:
- R具有将stdin视为文件的内部方法:
stdin()
.我建议您使用此而不是使用pipe('cat /dev/stdin')
.这可能会使它更健壮,并且肯定会更跨平台. - 您可以在开始时初始化
Mydata
,并继续使用rbind
对其进行扩展.如果rbind
的行数变大,这将变得很慢.这与以下事实有关:当对象增长时,操作系统需要为其找到新的内存位置,这最终会花费很多时间.更好的方法是预先分配MyData
或使用Apply样式循环.
- R has an internal way of treating stdin as file:
stdin()
. I suggest you use this instead of usingpipe('cat /dev/stdin')
. This probably makes it more robust, and definitely more cross-platform. - You initialize
Mydata
at the beginning and keep growing it usingrbind
. If the number of lines that yourbind
becomes larger, this will get really slow. This has to do with the fact that when the object grows, the OS needs to find a new memory location for it, which ends up taking a lot of time. Better is to pre-allocateMyData
, or use apply style loops.
这篇关于R-逐行读取STDIN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!