在ruby中处理大型CSV文件(20G) [英] dealing with large CSV files (20G) in ruby

查看:171
本文介绍了在ruby中处理大型CSV文件(20G)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理小问题,并会有一些建议如何解决它:
给定一个具有未知数目的列和行的csv文件,输出列的列表的值和每次值。

I am working on little problem and would have some advice on how to solve it: Given a csv file with an unknown number of columns and rows, output a list of columns with values and the number of times each value was repeated. without using any library.

如果文件小,这不应该是一个问题,但是当它是几个吉姆,我得到NoM​​emoryError:无法分配内存。有没有办法创建一个哈希并从磁盘读取,而不是将文件加载到内存?你可以在perl中使用bind Hashes

if the file is small this shouldn't be a problem, but when it is a few Gigs, i get NoMemoryError: failed to allocate memory. is there a way to create a hash and read from the disk instead of loading the file to Memory? you can do that in perl with tied Hashes

编辑:IO#foreach将文件加载到内存中?关于File.open(filename).each?

will IO#foreach load the file into memory? how about File.open(filename).each?

推荐答案

一次读取一行文件, go:

Read the file one line at a time, discarding each line as you go:

open("big.csv") do |csv|
  csv.each_line do |line|
    values = line.split(",")
    # process the values
  end
end

使用这种方法,你不应该耗尽内存。

Using this method, you should never run out of memory.

这篇关于在ruby中处理大型CSV文件(20G)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆