Python中的MemoryError,但不是IPython [英] MemoryError in Python but not IPython
问题描述
一般情况下 - 你能想到出现这种情况的原因(即Python中的MemoryError而不是IPython中的MemoryError(控制台 - 不是笔记本)?)
Generally-can you think of any reason why this would happen (i.e. a MemoryError in Python but not in IPython (console--not notebook)?)
To更具体地说,我在多类
和多标签$中使用了sklearn的
sgdclassifier
c $ c>案例。给出以下代码时出错:
To be more specific, I'm using sklearn's sgdclassifier
in the multiclass
and multilabel
case. It errors given the following code:
model = SGDClassifier(
loss='hinge',
penalty='l2',
n_iter=niter,
alpha=alpha,
fit_intercept=True,
n_jobs=1)
mc = OneVsRestClassifier(model)
mc.fit(X, y)
在致电 mc.fit(X,y)
,发生以下错误:
On calling mc.fit(X, y)
, the following error occurs:
File "train12-3b.py", line 411, in buildmodel
mc.fit(X, y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/multiclass.py", line 201, in fit
n_jobs=self.n_jobs)
File "/usr/local/lib/python2.7/dist-packages/sklearn/multiclass.py", line 88, in fit_ovr
Y = lb.fit_transform(y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 408, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/label.py", line 272, in transform
neg_label=self.neg_label)
File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/label.py", line 394, in label_binarize
Y = np.zeros((len(y), len(classes)), dtype=np.int)
MemoryError
Y
是一个包含600万行和 k
列的矩阵,其中有黄金标签是1,其余为0(在这种情况下, k = 21
,但我想去> 2000)。 Y
由 sklearn
转换为密集矩阵(因此 Y = np.zeros(( len(y),len(classes)),dtype = np.int)
MemoryError),即使它是以稀疏形式传入的。
Y
is a matrix with 6 million rows and k
columns, where the gold labels are 1 and the rest are 0 (in this case, k = 21
, but I'd like to go >2000). Y
gets converted by sklearn
to a dense matrix (hence Y = np.zeros((len(y), len(classes)), dtype=np.int)
MemoryError ), even if it is passed in as sparse.
我有60 gb的ram,有21列,最多不应超过8 gb(600万* 21 * 64),所以我是困惑。我重写了 Y = np.zeros((len(y),len(classes)),dtype = np.int
以使用 dtype = bool
,但没有运气。
I have 60 gb of ram, and with 21 columns, it shouldn't take more than 8 gb max (6 million * 21 * 64), so I'm confused. I rewrote the Y = np.zeros((len(y), len(classes)), dtype=np.int
to use dtype = bool
, but no luck.
有什么想法吗?
推荐答案
听起来你正在对标签二进制文件的当前实现进行限制:请参阅问题#2441 。有 PR#2458 修复它。
It sounds like you are hitting a limitation of the current implementation of the label binarizer: see issue #2441. There is PR #2458 to fix it.
请随意尝试该分支,并将结果作为对该PR的评论报告。
Please feel free to try that branch and report your results as a comment to that PR.
这篇关于Python中的MemoryError,但不是IPython的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!