通过引用传递 pandas DataFrame [英] Passing pandas DataFrame by reference

查看:85
本文介绍了通过引用传递 pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是关于熊猫DataFrame通过引用传递时的不变性.考虑以下代码:

My question is regarding immutability of pandas DataFrame when it is passed by reference. Consider the following code:

import pandas as pd

def foo(df1, df2):

    df1['B'] = 1
    df1 = df1.join(df2['C'], how='inner')

    return()

def main(argv = None):

    # Create DataFrames. 
    df1 = pd.DataFrame(range(0,10,2), columns=['A'])
    df2 = pd.DataFrame(range(1,11,2), columns=['C'])

    foo(df1, df2)    # Pass df1 and df2 by reference.

    print df1

    return(0)

if __name__ == '__main__':
    status = main()
    sys.exit(status)

输出为

   A  B  
0  0  1
1  2  1
2  4  1
3  6  1
4  8  1

而不是

   A  B  C
0  0  1  1
1  2  1  3
2  4  1  5
3  6  1  7
4  8  1  9

实际上,如果foo被定义为

In fact, if foo is defined as

def foo(df1, df2):

    df1 = df1.join(df2['C'], how='inner')
    df1['B'] = 1

    return()

(即,"join"语句在另一条语句之前)然后输出就是

(i.e. the "join" statement before the other statement) then the output is simply

   A    
0  0 
1  2 
2  4 
3  6 
4  8

为什么如此,我对此很感兴趣.任何见解将不胜感激.

I'm intrigued as to why this is the case. Any insights would be appreciated.

推荐答案

问题是由于以下原因引起的:

The issue is because of this line:

df1 = df1.join(df2['C'], how='inner')

df1.join(df2 ['C'],how ='inner')返回一个新的数据帧.在此行之后, df1 不再引用与该参数相同的数据帧,而是引用一个新的数据帧,因为它已被重新分配给新的结果.第一个数据帧继续存在,未经修改.这实际上不是熊猫问题,只是python和大多数其他语言正常工作的方式.

df1.join(df2['C'], how='inner') returns a new dataframe. After this line, df1 no longer refers to the same dataframe as the argument, but a new one, because it's been reassigned to the new result. The first dataframe continues to exist, unmodified. This isn't really a pandas issue, just the general way python, and most other languages, work.

某些pandas函数具有一个 inplace 参数,该参数可以执行您想要的操作,但是join操作却没有.如果需要修改数据框,则必须返回此新数据框,然后在函数外部重新分配它.

Some pandas functions have an inplace argument, which would do what you want, however the join operation doesn't. If you need to modify a dataframe, you'll have to return this new one instead and reassign it outside the function.

这篇关于通过引用传递 pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆