在没有标题的情况下逐行读取R中的大文件 [英] Read Large File line by line in R without header

查看:149
本文介绍了在没有标题的情况下逐行读取R中的大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个非常大的数据文件(在Giga中),如果尝试使用R打开它,则会收到内存不足的错误.

I have a very large data file in R (in Giga), If I try to open it with R , I will get an out of memory error.

我需要逐行读取文件并进行一些分析.我找到了关于此问题的上一个问题,其中文件被n行读取并跳转到某些行.我使用了尼克·萨布比"(Nick Sabbe)的答案,并添加了一些修改以满足我的需要.

I need to read the file line by line and do some analysis. I found a previous question on this issue where the file was read by n-lines and jump to certain lines with clump. I have used the answer by "Nick Sabbe" and added some modifications to fit my need.

考虑到我有以下文件的test.csv文件样本:

Consider that I have the following test.csv file-sample of the file:

A    B    C
200 19  0.1
400 18  0.1
300 29  0.1
800 88  0.1
600 80  0.1
150 50  0.1
190 33  0.1
270 42  0.1
900 73  0.1
730 95  0.1

我想逐行读取文件的内容并进行分析.因此,我基于"Nick Sabbe"发布的代码创建了以下循环以进行读取.我有两个问题: 1)每次我打印新行时,都会打印标题. 2)尽管我要删除该列,但也会打印R的索引"X"列.

I want to read the content of the file line by line and perform my analysis. So I have create the following loop to read based on the code posted by"Nick Sabbe". I have two problems: 1) The header is printed for each time I'm printing new line. 2) The index "X" column by R is also printed although I'm deleting this column.

这是我正在使用的代码:

Here is the code I'm using:

test<-function(){
 prev<-0

for(i in 1:100){
  j<-i-prev
  test1<-read.clump("file.csv",j,i)
  print(test1)
  prev<-i

}
}
####################
# Code by Nick Sabbe
###################
read.clump <- function(file, lines, clump, readFunc=read.csv,
                   skip=(lines*(clump-1))+ifelse((header) & (clump>1) & (!inherits(file, "connection")),1,0),
                   nrows=lines,header=TRUE,...){
if(clump > 1){
colnms<-NULL
if(header)
{
  colnms<-unlist(readFunc(file, nrows=1, header=F))
  #print(colnms)
}
p = readFunc(file, skip = skip,
             nrows = nrows, header=FALSE,...)
if(! is.null(colnms))
{
  colnames(p) = colnms
}
} else {
 p = readFunc(file, skip = skip, nrows = nrows, header=header)
}
p$X<-NULL   # Note: Here I'm setting the index to NULL
return(p)
}

我得到的输出:

       A       B    C
1      200      19   0.1
  NA   1       1     1
1  2   400     18   0.1
  NA   1       1    1
1  3   300     29   0.1
  NA   1       1    1
1  4   800     88   0.1
  NA   1       1    1
1  5   600     80   0.1

我想摆脱其余的阅读内容:

I want to get rid of for the rest of reading:

 NA   1       1     1

此外,还有什么方法可以使文件结尾(例如,其他语言的EOF)时停止for循环???

Also, is there any way to make the for loop stop when end of file such EOF in other language???

推荐答案

也许类似的东西可以为您提供帮助:

Maybe something like this can help you :

inputFile <- "foo.txt"
con  <- file(inputFile, open = "r")
while (length(oneLine <- readLines(con, n = 1)) > 0) {
  myLine <- unlist((strsplit(oneLine, ",")))
  print(myLine)
} 
close(con)

或进行扫描以避免分裂为@MatthewPlourde

or with scan to avoid splitting as @MatthewPlourde

我使用scan:我跳过标题,而quiet = TRUE则没有消息说已经有多少个项目

I use scan : I skip the header, and quiet = TRUE to not have message saying how many items have been

while (length(myLine <- scan(con,what="numeric",nlines=1,sep=',',skip=1,quiet=TRUE)) > 0 ){
   ## here I print , but you must have a process your line here
   print(as.numeric(myLine))

} 

这篇关于在没有标题的情况下逐行读取R中的大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆