ValueError:数组太大-无法理解如何解决此问题 [英] ValueError: array is too big - cannot understand how to fix this
问题描述
我运行以下代码:
traindata = trainData.read_csv('train.tsv', delimiter = '\t')
调用此函数:
def read_csv(self, filename, delimiter = ',', quotechar = '"'):
# open the file
reader = csv.reader(open(filename, 'rb'), delimiter = delimiter, quotechar = quotechar)
# read first line and extract its data
self.column_headings = np.array(next(reader))
# read subsequent lines
rows = []
for row in reader:
rows.append(row)
self.data = np.array(rows)
self.m, self.n = self.data.shape
这将允许我打电话
m, n = traindata.data.shape
print m, n, traindata.column_headings
不幸的是,在调用read_csv
函数时出现错误:
Unfortunately, in my call to the read_csv
function I get the error :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-74-1cc5776f9a9c> in <module>()
13 print "loading data.."
14
---> 15 traindata = trainData.read_csv('test.tsv', delimiter = '\t')
16
C:\pc in read_csv(self, filename, delimiter, quotechar)
17 for row in reader:
18 rows.append(row)
---> 19 self.data = np.array(rows)
20 self.m, self.n = self.data.shape
21
ValueError: array is too big.
如何解决此问题并允许代码运行?
How can I fix this behaviour and allow the code to run?
数据为.tsv文件,在此处提取.
Edit : The data is a .tsv file, extract here.
推荐答案
Numpy正在创建一个巨大的字符串数组,每个字符串的长度设置为该列中任何一个字符串的最大长度,您可能已经用完了ram在这种大量内存分配中.
Numpy is creating an array of huge strings, each with a length set to the maximum length of any one string in that column, and you are probably running out of ram in the middle of this massive memory allocation.
这样做
self.data = np.array(rows, dtype=object)
numpy不需要为字符串对象分配大块新内存-dtype=object
告诉numpy保留其数组内容作为对现有python对象的引用(字符串已经存在于python列表rows
中),并且这些指针比字符串对象占用的空间少得多.
numpy doesn't need to allocate big chunks of new memory for string objects - dtype=object
tells numpy to keep its array contents as references to existing python objects (the strings already exist in your python list rows
), and these pointers take up much less space than the string objects would.
这篇关于ValueError:数组太大-无法理解如何解决此问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!