Python,内存错误,csv文件过大 [英] Python, memory error, csv file too large
问题描述
我的 python 模块有问题,无法处理导入大数据文件(文件 target.csv 的权重接近 1 Gb)
I have a problem with a python module that cannot handle importing a big datafile (the file targets.csv weights nearly 1 Gb)
加载此行时出现错误:
targets = [(name, float(X), float(Y), float(Z), float(BG))
for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]
回溯:
Traceback (most recent call last):
File "C:\Users\gary\Documents\EPSON STUDIES\colors_text_D65.py", line 41, in <module>
for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]
MemoryError
我想知道是否有办法逐行打开文件targets.csv?还想知道这会减慢进程吗?
I was wondering if there's a way to open the file targets.csv line by line? And also wondering it this would slow down the process?
这个模块已经很慢了...
This module is already pretty slow...
谢谢!
import geometry
import csv
import numpy as np
import random
import cv2
S = 0
img = cv2.imread("MAP.tif", -1)
height, width = img.shape
pixx = height * width
iterr = float(pixx / 1000)
accomplished = 0
temp = 0
ppm = file("epson gamut.ppm", 'w')
ppm.write("P3" + "\n" + str(width) + " " + str(height) + "\n" + "255" + "\n")
# PPM file header
all_colors = [(name, float(X), float(Y), float(Z))
for name, X, Y, Z in csv.reader(open('XYZcolorlist_D65.csv'))]
# background is marked SUPPORT
support_i = [i for i, color in enumerate(all_colors) if color[0] == '255 255 255']
if len(support_i)>0:
support = np.array(all_colors[support_i[0]][1:])
del all_colors[support_i[0]]
else:
support = None
tg, hull_i = geometry.tetgen_of_hull([(X,Y,Z) for name, X, Y, Z in all_colors])
colors = [all_colors[i] for i in hull_i]
print ("thrown out: "
+ ", ".join(set(zip(*all_colors)[0]).difference(zip(*colors)[0])))
targets = [(name, float(X), float(Y), float(Z), float(BG))
for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]
for target in targets:
name, X, Y, Z, BG = target
target_point = support + (np.array([X,Y,Z]) - support)/(1-BG)
tet_i, bcoords = geometry.containing_tet(tg, target_point)
if tet_i == None:
#print str("out")
ppm.write(str("255 255 255") + "\n")
print "out"
temp += 1
if temp >= iterr:
accomplished += temp
print str(100 * accomplished / (float(pixx))) + str(" %")
temp = 0
continue
# not in gamut
else:
A = bcoords[0]
B = bcoords[1]
C = bcoords[2]
D = bcoords[3]
R = random.uniform(0,1)
names = [colors[i][0] for i in tg.tets[tet_i]]
if R <= A:
S = names[0]
elif R <= A+B:
S = names[1]
elif R <= A+B+C:
S = names[2]
else:
S = names[3]
ppm.write(str(S) + "\n")
temp += 1
if temp >= iterr:
accomplished += temp
print str(100 * accomplished / (float(pixx))) + str(" %")
temp = 0
print "done"
ppm.close()
推荐答案
csv.reader()
已经一次读取一行.但是,您首先将所有行收集到一个列表中.您应该一次处理一行.一种方法是切换到生成器,例如:
csv.reader()
already reads the lines one at a time. However, you're collecting all of the lines into a list first. You should process the lines one at a time. One approach is to switch to a generator, for example:
targets = ((name, float(X), float(Y), float(Z), float(BG))
for name, X, Y, Z, BG in csv.reader(open('targets.csv')))
(从方括号切换到括号应该将 target
从列表推导式更改为生成器.)
(Switching from square brackets to parens should change target
from a list comprehension to a generator.)
这篇关于Python,内存错误,csv文件过大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!