ValueError:数组太大-无法理解如何解决此问题 [英] ValueError: array is too big - cannot understand how to fix this

查看:278
本文介绍了ValueError:数组太大-无法理解如何解决此问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我运行以下代码:

traindata = trainData.read_csv('train.tsv', delimiter = '\t')

调用此函数:

def read_csv(self, filename, delimiter = ',', quotechar = '"'):
    # open the file
    reader = csv.reader(open(filename, 'rb'), delimiter = delimiter, quotechar = quotechar)
    # read first line and extract its data 
    self.column_headings = np.array(next(reader))
    # read subsequent lines
    rows = []
    for row in reader:
        rows.append(row)
    self.data = np.array(rows)
    self.m, self.n = self.data.shape

这将允许我打电话

m, n = traindata.data.shape
print m, n, traindata.column_headings

不幸的是,在调用read_csv函数时出现错误:

Unfortunately, in my call to the read_csv function I get the error :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-74-1cc5776f9a9c> in <module>()
     13 print "loading data.."
     14 
---> 15 traindata = trainData.read_csv('test.tsv', delimiter = '\t')
     16 
C:\pc in read_csv(self, filename, delimiter, quotechar)
     17         for row in reader:
     18             rows.append(row)
---> 19         self.data = np.array(rows)
     20         self.m, self.n = self.data.shape
     21 

ValueError: array is too big.

如何解决此问题并允许代码运行?

How can I fix this behaviour and allow the code to run?

数据为.tsv文件,在此处提取.

Edit : The data is a .tsv file, extract here.

推荐答案

Numpy正在创建一个巨大的字符串数组,每个字符串的长度设置为该列中任何一个字符串的最大长度,您可能已经用完了ram在这种大量内存分配中.

Numpy is creating an array of huge strings, each with a length set to the maximum length of any one string in that column, and you are probably running out of ram in the middle of this massive memory allocation.

这样做

self.data = np.array(rows, dtype=object) 

numpy不需要为字符串对象分配大块新内存-dtype=object告诉numpy保留其数组内容作为对现有python对象的引用(字符串已经存在于python列表rows中),并且这些指针比字符串对象占用的空间少得多.

numpy doesn't need to allocate big chunks of new memory for string objects - dtype=object tells numpy to keep its array contents as references to existing python objects (the strings already exist in your python list rows), and these pointers take up much less space than the string objects would.

这篇关于ValueError:数组太大-无法理解如何解决此问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆