python numpy和内存效率(通过引用与值进行传递) [英] python numpy and memory efficiency (pass by reference vs. value)

查看:390
本文介绍了python numpy和内存效率(通过引用与值进行传递)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近越来越多地使用python代替c/c ++,因为它将我的编码时间减少了几倍.同时,当我处理大量数据时,我的python程序的运行速度开始变得慢于c语言.我想知道这是否是由于我没有充分利用大型对象/数组而导致的. 是否有关于numpy/python如何处理内存的全面指南?什么时候通过引用传递值,什么时候通过值传递,什么时候复制事物,什么时候不复制事物,哪些类型是可变的,哪些不是.

I've recently been using python more and more in place of c/c++ because of it cuts my coding time by a factor of a few. At the same time, when I'm processing large amounts of data, the speed at which my python programs run starts to become a lot slower than in c. I'm wondering if this is due to me using large objects/arrays inefficiently. Is there any comprehensive guide just to how memory is handled by numpy/python? When things are passed by reference and when by value, when things are copied and when not, what types are mutable and which are not.

推荐答案

将python(和大多数主流语言)中的对象作为参考传递.

Objects in python (and most mainstream languages) are passed as reference.

例如,以numpy为例,通过索引现有数组创建的新"数组仅是原始数组的视图.例如:

If we take numpy, for example, "new" arrays created by indexing existing ones are only views of the original. For example:

import numpy as np

>>> vec_1 = np.array([range(10)])
>>> vec_1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> vec_2 = vec_1[3:] # let vec_2 be vec_1 from the third element untill the end
>>> vec_2
array([3, 4, 5, 6, 7, 8, 9])
>>> vec_2[3] = 10000
array([3, 4, 5, 10000, 7, 8, 9])
>>> vec_1
array([0, 1, 2, 3, 4, 5, 10000, 7, 8, 9])

Numpy有一个方便的方法可以帮助您解决问题,称为may_share_memory(obj1,obj2).所以:

Numpy have a handy method to help with your questions, called may_share_memory(obj1, obj2). So:

>>> np.may_share_memory(vec_1, vec_2)
True

请小心,因为该方法有可能返回假阳性(尽管我从未见过).

Just be carefull, because it`s possible for the method to return false positives (Although i never saw one).

在SciPy 2013上,有一个有关numpy的教程( http://conference.scipy.org/scipy2013 /tutorial_detail.php?id=100 ).最后,那家伙谈论了一些numpy如何处理内存的问题.观看.

At SciPy 2013 there was a tutorial on numpy (http://conference.scipy.org/scipy2013/tutorial_detail.php?id=100). At the end the guy talks a little about how numpy handles memory. Watch it.

根据经验,默认情况下,对象几乎永远不会作为值传递.即使是封装在另一个对象上的对象.另一个示例,其中一个列表进行游览:

As a rule of thumb, objects are almost never passed as value by default. Even the ones encapsulated on another object. Another example, where a list makes a tour:

Class SomeClass():

    def __init__(a_list):
        self.inside_list = a_list

    def get_list(self):
        return self.inside_list

>>> original_list = range(5)
>>> original_list
[0,1,2,3,4]
>>> my_object = SomeClass(original_list)
>>> output_list = my_object.get_list()
>>> output_list
[0,1,2,3,4]
>>> output_list[4] = 10000
>>> output_list
[0,1,2,3,10000]
>>> my_object.original_list
[0,1,2,3,10000]
>>> original_list
[0,1,2,3,10000]

令人毛骨悚然,对吧? 使用赋值符号("="),或在函数末尾返回一个,您将始终创建指向该对象或其一部分的指针. 仅当您明确地使用对象复制时才使用对象复制,例如使用some_dict.copy或array [:]这样的复制方法.例如:

Creepy, huh? Using the assignment symbol ("="), or returning one in the end of a function you will always create a pointer to the object, or a portion of it. Objects are only duplicated when you explicitly do so, using a copy method like some_dict.copy, or array[:]. For example:

>>> original_list = range(5)
>>> original_list
[0,1,2,3,4]
>>> my_object = SomeClass(original_list[:])
>>> output_list = my_object.get_list()
>>> output_list
[0,1,2,3,4]
>>> output_list[4] = 10000
>>> output_list
[0,1,2,3,10000]
>>> my_object.original_list
[0,1,2,3,10000]
>>> original_list
[0,1,2,3,4]

知道了吗?

这篇关于python numpy和内存效率(通过引用与值进行传递)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆