通过引用传递 pandas DataFrame [英] Passing pandas DataFrame by reference
问题描述
我的问题是关于熊猫DataFrame通过引用传递时的不变性.考虑以下代码:
My question is regarding immutability of pandas DataFrame when it is passed by reference. Consider the following code:
import pandas as pd
def foo(df1, df2):
df1['B'] = 1
df1 = df1.join(df2['C'], how='inner')
return()
def main(argv = None):
# Create DataFrames.
df1 = pd.DataFrame(range(0,10,2), columns=['A'])
df2 = pd.DataFrame(range(1,11,2), columns=['C'])
foo(df1, df2) # Pass df1 and df2 by reference.
print df1
return(0)
if __name__ == '__main__':
status = main()
sys.exit(status)
输出为
A B
0 0 1
1 2 1
2 4 1
3 6 1
4 8 1
而不是
A B C
0 0 1 1
1 2 1 3
2 4 1 5
3 6 1 7
4 8 1 9
实际上,如果foo被定义为
In fact, if foo is defined as
def foo(df1, df2):
df1 = df1.join(df2['C'], how='inner')
df1['B'] = 1
return()
(即,"join"语句在另一条语句之前)然后输出就是
(i.e. the "join" statement before the other statement) then the output is simply
A
0 0
1 2
2 4
3 6
4 8
为什么如此,我对此很感兴趣.任何见解将不胜感激.
I'm intrigued as to why this is the case. Any insights would be appreciated.
推荐答案
问题是由于以下原因引起的:
The issue is because of this line:
df1 = df1.join(df2['C'], how='inner')
df1.join(df2 ['C'],how ='inner')
返回一个新的数据帧.在此行之后, df1
不再引用与该参数相同的数据帧,而是引用一个新的数据帧,因为它已被重新分配给新的结果.第一个数据帧继续存在,未经修改.这实际上不是熊猫问题,只是python和大多数其他语言正常工作的方式.
df1.join(df2['C'], how='inner')
returns a new dataframe. After this line, df1
no longer refers to the same dataframe as the argument, but a new one, because it's been reassigned to the new result. The first dataframe continues to exist, unmodified. This isn't really a pandas issue, just the general way python, and most other languages, work.
某些pandas函数具有一个 inplace
参数,该参数可以执行您想要的操作,但是join操作却没有.如果需要修改数据框,则必须返回此新数据框,然后在函数外部重新分配它.
Some pandas functions have an inplace
argument, which would do what you want, however the join operation doesn't. If you need to modify a dataframe, you'll have to return this new one instead and reassign it outside the function.
这篇关于通过引用传递 pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!