python pandas dataframe,是按值传递还是按引用传递 [英] python pandas dataframe, is it pass-by-value or pass-by-reference

查看:1169
本文介绍了python pandas dataframe,是按值传递还是按引用传递的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我将数据帧传递给函数并在函数内部进行修改,是按值传递还是按引用传递?

If I pass a dataframe to a function and modify it inside the function, is it pass-by-value or pass-by-reference?

我运行以下代码

a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
def letgo(df):
    df = df.drop('b',axis=1)
letgo(a)

a的值在函数调用后不会更改.这是否意味着它是按值传递的?

the value of a does not change after the function call. Does it mean it is pass-by-value?

我还尝试了以下方法

xx = np.array([[1,2], [3,4]])
def letgo2(x):
    x[1,1] = 100
def letgo3(x):
    x = np.array([[3,3],[3,3]])

事实证明,letgo2()确实会更改xx,而letgo3()不会更改.为什么会这样?

It turns out letgo2() does change xx and letgo3() does not. Why is it like this?

推荐答案

简短的答案是,Python始终会传递值,但是每个Python变量实际上都是指向某个对象的指针,因此有时看起来像是传递-通过引用.

The short answer is, Python always does pass-by-value, but every Python variable is actually a pointer to some object, so sometimes it looks like pass-by-reference.

在Python中,每个对象都是可变的或不可更改的.例如,列表,字典,模块和Pandas数据帧是可变的,而int,字符串和元组则是不可变的.可变对象可以在内部进行更改(例如,将元素添加到列表中),但非可变对象则不能.

In Python every object is either mutable or non-mutable. e.g., lists, dicts, modules and Pandas data frames are mutable, and ints, strings and tuples are non-mutable. Mutable objects can be changed internally (e.g., add an element to a list), but non-mutable objects cannot.

正如我在一开始所说的,您可以将每个Python变量都视为一个指向对象的指针.当您将变量传递给函数时,函数中的变量(指针)始终是传入的变量(指针)的副本.因此,如果将新内容分配给内部变量,则您所做的就是更改局部变量指向另一个对象.这不会更改(变异)该变量所指向的原始对象,也不会使外部变量指向新对象.此时,外部变量仍指向原始对象,但内部变量指向新对象.

As I said at the start, you can think of every Python variable as a pointer to an object. When you pass a variable to a function, the variable (pointer) within the function is always a copy of the variable (pointer) that was passed in. So if you assign something new to the internal variable, all you are doing is changing the local variable to point to a different object. This doesn't alter (mutate) the original object that the variable pointed to, nor does it make the external variable point to the new object. At this point, the external variable still points to the original object, but the internal variable points to a new object.

如果要更改原始对象(仅适用于可变数据类型),则必须执行某些操作来更改对象,而无需为本地变量分配一个全新的值.这就是为什么letgo()letgo3()保留外部项目不变,但是letgo2()对其进行更改的原因.

If you want to alter the original object (only possible with mutable data types), you have to do something that alters the object without assigning a completely new value to the local variable. This is why letgo() and letgo3() leave the external item unaltered, but letgo2() alters it.

正如@ursan指出的那样,如果letgo()使用了类似的内容,那么它将更改(变异)df指向的原始对象,这将更改通过全局a变量看到的值:

As @ursan pointed out, if letgo() used something like this instead, then it would alter (mutate) the original object that df points to, which would change the value seen via the global a variable:

def letgo(df):
    df.drop('b', axis=1, inplace=True)

a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo(a)  # will alter a

在某些情况下,您可以完全挖空原始变量并用新数据重新填充它,而无需实际进行直接赋值,例如这将更改v指向的原始对象,这将更改以后使用v时看到的数据:

In some cases, you can completely hollow out the original variable and refill it with new data, without actually doing a direct assignment, e.g. this will alter the original object that v points to, which will change the data seen when you use v later:

def letgo3(x):
    x[:] = np.array([[3,3],[3,3]])

v = np.empty((2, 2))
letgo3(v)   # will alter v

请注意,我没有将任何内容直接分配给x;我正在为x的整个内部范围分配内容.

Notice that I'm not assigning something directly to x; I'm assigning something to the entire internal range of x.

如果绝对必须创建一个全新的对象并使其在外部可见(熊猫有时是这种情况),则有两个选择. 干净"选项只是返回新对象,例如

If you absolutely must create a completely new object and make it visible externally (which is sometimes the case with pandas), you have two options. The 'clean' option would be just to return the new object, e.g.,

def letgo(df):
    df = df.drop('b',axis=1)
    return df

a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
a = letgo(a)

另一种选择是到达函数外部并直接更改全局变量.这会将a更改为指向一个新对象,此后引用a的任何函数都将看到该新对象:

Another option would be to reach outside your function and directly alter a global variable. This changes a to point to a new object, and any function that refers to a afterward will see that new object:

def letgo():
    global a
    a = a.drop('b',axis=1)

a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo()   # will alter a!

直接更改全局变量通常不是一个好主意,因为任何读过您代码的人都很难弄清楚a是如何更改的. (我通常将全局变量用于脚本中许多函数使用的共享参数,但我不允许它们更改那些全局变量.)

Directly altering global variables is usually a bad idea, because anyone who reads your code will have a hard time figuring out how a got changed. (I generally use global variables for shared parameters used by many functions in a script, but I don't let them alter those global variables.)

这篇关于python pandas dataframe,是按值传递还是按引用传递的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆