Python(numpy)使包含大量数组元素的系统崩溃 [英] Python (numpy) crashes system with large number of array elements

查看:67
本文介绍了Python(numpy)使包含大量数组元素的系统崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用scikit提供的许多分类器来构建基本的字符识别模型.所使用的数据集是标准的手写字母数字样本集(来自此

I'm trying to build a basic character recognition model using the many classifiers that scikit provides. The dataset being used is a standard handwritten set of alphanumeric samples (Chars74K image dataset taken from this source: EnglishHnd.tgz).

每个字符有55个样本(总共62个字母数字字符),每个样本均为900x1200像素.我将矩阵(首先转换为灰度)展平为1x1080000数组(每个数组代表一个特征).

There are 55 samples of each character (62 alphanumeric characters in all), each being 900x1200 pixels. I'm flattening the matrix (first converting to grayscale) into a 1x1080000 array (each representing a feature).

for sample in sample_images: # sample images is the list of the .png files
    img = imread(sample);
    img_gray = rgb2gray(img);
    if n == 0 and m == 0: # n and m are global variables
        n, m = np.shape(img_gray);
    img_gray = np.reshape(img_gray, n*m);
    img_gray = np.append(img_gray, sample_id); # sample id stores the label of the training sample
    if len(samples) == 0: # samples is the final numpy ndarray
        samples = np.append(samples, img_gray);
        samples = np.reshape(samples, [1, n*m + 1]);
    else:
        samples = np.append(samples, [img_gray], axis=0);

因此,最终数据结构应具有55x62阵列,其中每个阵列的容量为1080000个元素.仅存储最终结构(中间矩阵的范围是局部的).

So the final data structure should have 55x62 arrays, where each array is 1080000 elements in capacity. Only the final structure is being stored (the scope of the intermediate matrices is local).

为学习该模型而存储的数据量非常大(我想),因为该程序实际上并没有进展到一定程度,并且使我的系统崩溃到必须修复BIOS的程度!

The amount of data being stored to learn the model is pretty large (I guess), because the program isn't really progressing beyond a point, and crashed my system to the extent that the BIOS had to be repaired!

到目前为止,该程序仅收集要发送给分类器的数据...分类还没有引入代码中.

Upto this point, the program is only gathering the data to send to the classifier ... the classification hasn't even been introduced into the code yet.

关于如何做才能更有效地处理数据的任何建议?

Any suggestions as to what can be done to handle the data more efficiently?

注意:我正在使用numpy来存储扁平化矩阵的最终结构.此外,系统具有8Gb RAM.

Note: I'm using numpy to store the final structure of flattened matrices. Also, the system has an 8Gb RAM.

推荐答案

这似乎是堆栈溢出的情况.如果我理解您的问题,则您有3,682,800,000个数组元素.什么是元素类型?如果是一个字节,则大约为3 GB的数据,足以填满堆栈大小(通常约为1兆字节).即使只有一点点元素,您仍然有500 mb的空间.尝试使用堆内存(计算机上最多8 GB)

This seems like a case of stack overflow. You have 3,682,800,000 array elements, if I understand your question. What is the element type? If it is one byte, that is about 3 gigabytes of data, easily enough to fill up your stack size (usually about 1 megabyte). Even with one bit an element, you are still at 500 mb. Try using heap memory (up to 8 gigs on your machine)

这篇关于Python(numpy)使包含大量数组元素的系统崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆