在R中使用大型csv文件 [英] Working with large csv file in R

查看:107
本文介绍了在R中使用大型csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何帮助将不胜感激.

我使用以下代码分解了较大的csv文件(4gb),现在我试图将第2、3rd ...部分保存到csv中.但是,我只能访问我的数据的第一块.

I used the following code to break down my large csv file (4gb) and now I am trying to save the 2nd, 3rd... part into a csv. However, I can only access the first chunk of my data.

我的代码有什么问题吗?如何将第二部分数据保存到csv中?

Is there anything wrong with my code? How do I save the second chunk of my data into csv?

rgfile <- 'filename.csv' 

index <- 0  

chunkSize <- 100000

con <- file(description = rgfile, open="r")

dataChunk <- read.table(con, nrows= chunkSize, header=T, fill= TRUE, sep= ",")

actualColumnNames <- names(dataChunk)

repeat {

  index <- index + 1 

  print(paste('Processing rows:', index * chunkSize)) 

  if (nrow(dataChunk) != chunkSize){
    print('Processed all files!')
    break
  }

  dataChunk <- read.table(
    con, nrows = chunkSize, skip=0, header = FALSE, 
    fill=TRUE, sep = ",", col.names=actualColumnNames
  ) 

  break

}

推荐答案

library(tidyverse)
library(nycflights13)

# make the problelm reproducible
rgfile <- 'flights.csv' 
write_csv(flights, rgfile)

# now, get to work

lines <- as.numeric(R.utils::countLines(rgfile))

chunk_size <- 100000

hdr <- read_csv(rgfile, n_max=2)

fnum <- 1

for (i in seq(1, lines, chunk_size)) {

  suppressMessages(
    read_csv(
      rgfile, col_names=colnames(hdr), skip=(i-1), n_max=chunk_size
    )
  ) -> x

  if (i>1) colnames(x) <- colnames(hdr)

  write_csv(x, sprintf("file%03d.csv", fnum))

  fnum <- fnum + 1

}

这篇关于在R中使用大型csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆