使用scipy.minimze的Python 2.7内存泄漏 [英] Python 2.7 memory leak with scipy.minimze
问题描述
在合适的过程中,我的RAM内存缓慢但稳定地增加(每两秒钟约2.8 mb),直到出现内存错误或终止程序.当我尝试通过对模型进行拟合来拟合约80个测量值时,就会发生这种情况.通过使用scipy.minimze最小化Chi_squared来完成此拟合.
During a fit procedure, my RAM memory slowly but steadily (about 2.8 mb every couple of seconds) increases until I get a memory error or I terminate the program. This happens when I try to fit some 80 measurements by fitting a model to them. This fitting is done by using scipy.minimze to minimize Chi_squared.
到目前为止,我已经尝试过:
So far I've tried:
- Playing with the Garbage collector to collect every time Chi_squared calls my model, didn't help.
- Looking at all variables using global() and then using pympler.asizeof to find the total amount of space my variables take up, this first increases but then stays constant.
- The pympler.tracker.SummaryTracker also didn't show any increase of variable size.
通过这些测试,似乎我的RAM使用量增加了,而我的变量占用的总空间却是恒定的.我真的很想知道我的记忆力.
From these test, it seems that my RAM usage goes up while the total space my variables take up is constant. Where my memory goes I would really like to know.
下面的代码为我重现了问题:
The code below reproduces the problem for me:
import numpy as np import scipy import scipy.optimize as op import scipy.stats import scipy.integrate def fit_model(model_pmt, x_list, y_list, PMT_parra, PMT_bounds=None, tolerance=10**-1, PMT_start_gues=None): result = op.minimize(chi_squared, PMT_start_gues, args=(x_list, y_list, model_pmt, PMT_parra[0], PMT_parra[1], PMT_parra[2]), bounds=PMT_bounds, method='SLSQP', options={"ftol": tolerance}) print result def chi_squared(fit_parm, x, y_val, model, *non_fit_parm): parm = np.concatenate((fit_parm, non_fit_parm)) y_mod = model(x, *parm) X2 = sum(pow(y_val - y_mod, 2)) return X2 def basic_model(cb_list, max_intesity, sigma_e, noise, N, centre1, centre2, sigma_eb, min_dist=10**-5): """ plateau function consisting of two gaussian CDF functions. """ def get_distance(x, r): dist = abs(x - r) if dist < min_dist: dist = min_dist return dist def amount_of_material(x): A = scipy.stats.norm.cdf((x - centre1) / sigma_e) B = (1 - scipy.stats.norm.cdf((x - centre2) / sigma_e)) cube = A * B return cube def amount_of_field_INTEGRAL(x, cb): """Integral that is part of my sum""" result = scipy.integrate.quad(lambda r: scipy.stats.norm.pdf((r - cb) / sigma_b) / pow(get_distance(x, r), N), start, end, epsabs=10 ** -1)[0] return result # Set some constants, not important sigma_b = (sigma_eb**2-sigma_e**2)**0.5 start, end = centre1 - 3 * sigma_e, centre2 + 3 * sigma_e integration_range = np.linspace(start, end, int(end - start) / 20) intensity_list = [] # Doing a riemann sum, this is what takes the most time. for i, cb_point in enumerate(cb_list): intensity = sum([amount_of_material(x) * amount_of_field_INTEGRAL(x, cb_point) for x in integration_range]) intensity *= (integration_range[1] - integration_range[0]) intensity_list.append(intensity) model_values = np.array(intensity_list) / max(intensity_list)* max_intesity + noise return model_values def get_dummy_data(): """Can be ignored, produces something resembling my data with noise""" # X is just a range x_list = np.linspace(0, 300, 300) # Y is some sort of step function with noise A = scipy.stats.norm.cdf((x_list - 100) / 15.8) B = (1 - scipy.stats.norm.cdf((x_list - 200) / 15.8)) y_list = A * B * .8 + .1 + np.random.normal(0, 0.05, 300) return x_list, y_list if __name__=="__main__": # Set some variables start_pmt = [0.7, 8, 0.15, 0.6] pmt_bounds = [(.5, 1.3), (4, 15), (0.05, 0.3), (0.5, 3)] pmt_par = [110, 160, 15] x_list, y_list = get_dummy_data() fit_model(basic_model, x_list, y_list, pmt_par, PMT_start_gues=start_pmt, PMT_bounds=pmt_bounds, tolerance=0.1)
感谢您的帮助!
推荐答案
我通过逐层删除间接层来缩小问题范围. (@ joris267,这是您真正应该做的事,然后再询问.)剩下的 minimum (最小)代码可以重现该问题,如下所示:
I narrowed down the problem by successively removing layer after layer of indirection. (@joris267 This is something you really should have done yourself before asking.) The minimal remaining code to reproduce the problem looks like this:
import scipy.integrate if __name__=="__main__": while True: scipy.integrate.quad(lambda r: 0, 1, 100)
结论:
- 是的,有内存泄漏.
- 不,泄漏不在
scipy.minimize
中,而是在scipy.quad
中.
- Yes, there is e memory leak.
- No, the leak is not in
scipy.minimize
but inscipy.quad
.
但是,这是scipy 0.19.0的已知问题.升级到0.19.1应该可以解决问题,但是我不确定,因为我本人仍然使用0.19.0:)
However, this is a known issue with scipy 0.19.0. Upgrade to 0.19.1 should supposedly fix the problem, but I don't know for sure because I'm still with 0.19.0 myself :)
更新:
将scipy升级到0.19.1(出于兼容性考虑,将numpy升级到1.13.3)后,该漏洞在我的系统中消失了.
After upgrading scipy to 0.19.1 (and numpy to 1.13.3 for compatibility) the leak disapeared on my system.
这篇关于使用scipy.minimze的Python 2.7内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!