内存错误而调用genfromtxt方法 [英] Memory Error while calling genfromtxt method

查看:243
本文介绍了内存错误而调用genfromtxt方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

code:

import scipy as sp
import matplotlib.pyplot as plt

data=sp.genfromtxt("data/train.tsv", delimiter ="\t", dtype="string", comments=None, skip_header=1)
x = data[:,0]
y = data[:,1]
x = x[~sp.isnan(y)]
y = x[~sp.isnan(y)]


DataOfInterest=x["avglinksize"]
EphemeralOrEvergreen=x["label"]
plt.scatter(DataOfInterest,EphemeralOrEvergreen)
plt.title("Training data")
plt.xlabel("Single feature from training set")
plt.ylabel("Ephemeral or Evergreen")
plt.grid()
plt.show()

输出:

蟒蛇GenGraphs.py

python GenGraphs.py

Traceback (most recent call last):
  File "GenGraphs.py", line 4, in <module>
    data=sp.genfromtxt("data/train.tsv", delimiter ="\t", dtype="string", comments=None, skip_header=1)
  File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 1746, in genfromtxt
    output = np.array(data, dtype)
MemoryError

我想在对阵另一TSV文件,以图一列。

I am trying to graph one column in the tsv file against another.

我有什么误解吗?我还能怎么办呢?

What have I misunderstood here? How else can I do this ?

推荐答案

您可以使用加载它 np.memmap ,它会要求你约70MB:

You can load it using a np.memmap, which will demand you about 70MB:

import numpy as np
with open('train.tsv') as f:
    mm = np.memmap('test.memmap', shape=(7395, 27), dtype='|S4000', mode='w+')
    f.next()
    for i, l in enumerate(f):
        mm[i,:] = l.strip().replace('"','').split('\t')

当您删除 M 德尔米或当您关闭了Python控制台中的文件被保存。也许你将不得不在创建文件后,模式切换到 R +

The file is saved when you delete m with del m or when you close the Python console. Maybe you will have to change the mode to r+ after the file is created.

您可以用MEMMAP阵列工作,就好像它是一个正常的阵列,这将允许你只需要关注的部分。

You can work with the memmap array as if it was a normal array, which will allow you to take only the parts of interest.

这篇关于内存错误而调用genfromtxt方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆