在功能内附加DataFrame [英] Append DataFrame inside Function

查看:47
本文介绍了在功能内附加DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 test 函数,该函数接受一个DataFrame并将数据附加到该数据帧。我希望更改放置在函数中的全局变量。我的脚本如下:

I have a function test that takes a DataFrame and appends data to it. I want the global variable placed into the function to be changed. I have the script below:

import pandas as pd
global dff

def test(df):
    df = df.append({'asdf':1, 'sdf':2}, ignore_index=True)
    return(df)

dff = pd.DataFrame()
test(dff)

此后, dff 保持为空;它没有被编辑。但是,如果这样做,则:

After this, dff remains empty; it was not edited. However, if you do this:

import pandas as pd

def test(df):
    df['asdf'] = [1,2,3]
    return(df)

dff = pd.DataFrame()
test(dff)

dff 将具有 [ 1,2,3] 'asfd'列下。请注意,我什至不必将变量声明为 global

dff will have [1,2,3] under the column 'asfd'. Notice that I didn't even have to declare the variable as global.

为什么会这样?

我实际上想知道,因为第二个我认为我了解可变的工作空间,事实证明我错了,而且我不厌其烦地不断遇到这个问题BS *

I actually would like to know, because the second I think I understand variable workspaces, I'm proven wrong and I'm getting sick and tired of constantly running into this BS*

我知道问题的解决方案是:

I know the solution to the problem is:

import pandas as pd

def test(df):
    df = df.append({'asdf':1, 'sdf':2}, ignore_index=True)
    return(df)

dff = pd.DataFrame()
dff = test(dff)

,但是我实际上只是想弄清楚为什么最初的方法不起作用,尤其是考虑到我显示的第二个脚本。

but I'm really just trying to figure out why the initial method isn't working, especially in light of the second script I've shown.

*显然不是完整的BS,但经过3年的随意编程,我无法理解它。

*obviously it's not complete BS, but I can't understand it after 3 years of casual programming

推荐答案

更新:



我在PyCon 2015上发现了一个非常不错的演讲,该演讲解释了我在下面试图解释的内容,但其中的图表使其变得更加清晰。我将在下面留下解释,以解释原始的3个脚本的工作原理,但建议您观看视频:

Update:

I found a very nice talk at PyCon 2015 that explains what I'm attempting to explain below, but with diagrams that make it significantly clearer. I'll leave the explanation below to explain how the original 3 scripts work, but I'd suggest going to watch the video:

内德·巴切尔德(Ned Batchelder)-有关Python名称和值的事实和神话-PyCon 2015

所以,我想我已经弄清楚了上面两个脚本中发生的事情。我会尝试将其分解。

So, I think I've figured out what is happening in the two scripts above. I'll trying a break it down. Feel free to correct me if need be.

一些规则:


  1. 变量是指向实际保存数据的基础对象的链接/指针的名称。例如,街道地址。街道地址不是房屋;它只是指向一所房子。因此,地址(Streetway Rd。101)是指针。在GPS中,您可能将其标记为家庭。单词 Home本身就是变量。

  1. Variables are names of links/pointers to an underlying object that actually holds the data. For example, street addresses. A street address is not a house; it simply points to a house. So the address (101 Streetway Rd.) is the pointer. In a GPS, you might have it labeled as "Home". The word "Home" would be the variable itself.

函数对对象起作用,而不对变量或指针起作用。当您将变量传递给函数时,实际上是在传递对象,而不是变量或指针。继续以房屋为例,如果要在房屋中添加甲板,则要让甲板承包商在房屋上而不是超物理地址上工作。

Functions work on objects, not variables or pointers. When you pass a variable to a function, you are actually passing the object, not the variable or pointer. Continuing the house example, if you want to add a deck to a house, you want to the decking contractors to work on the house, not the meta-physical address.

函数中的 return 命令返回一个指向对象的指针。因此,这将是房屋的地址,而不是房屋或您可能会称呼房屋的名称。

The return command in a function returns an pointer to an object. So this would be the address of the house, not the house or the name you might call your house.

= 是一个函数,表示指向此对象。 = 前面的变量是输出,右边的变量是输入。这就是命名房屋的行为。因此, Home = 101 Streetway Rd。使变量 Home 指向101 Streetway Rd上的房屋。假设您搬进了邻居家,即Street Street Rd 102号。这可以通过 Home =邻居的房子完成。现在, Home 现在是指针102 Streetway Rd的名称。

= is a function meaning 'point to this object'. The variable in front of the = is the output, the variable to the right is the input. This would be the act of naming a house. So Home = 101 Streetway Rd. makes the variable Home point to the house on 101 Streetway Rd. Let's say you moved into your neighbors house, which is 102 Streetway Rd. This could be done by Home = Neighbor's House. Now, Home is now the name of the pointer 102 Streetway Rd.

在这里,我将使用 ---> 来表示指向的点

Here on out, I'll use ---> to mean "points to"

在开始使用脚本之前,让我们从所需内容开始。我们想要变量所指向的对象 objdff

Before we get to the Scripts let's start with what we want. We want the object objdff pointed to by a varia

(没有 global dff ,因为这没有任何意义)

(without the global dff as that doesn't do anything relevant)




import pandas as pd

def test(df):
    df = df.append({'asdf':1, 'sdf':2}, ignore_index=True)
    return(df)

dff = pd.DataFrame()
test(dff)


所以让我们来看一下该函数。直到我们到达以下位置,才发生有趣的事情:

So let's walk through the function. Nothing interesting happens until we get to:

dff = pd.DataFrame()

在这里,我们将变量 dff 分配给 pd.DataFrame ,这是一个空的数据框。我们将此对象称为 objdff 。因此,在此行的结尾,我们有 dff ---> objdff

Here, we have the varible dff being assigned to the object created by pd.DataFrame, which is an empty dataframe. We'll call this object objdff. So at the end of this line, we have dff ---> objdff.

下一行: test(dff)

函数在对象上起作用,所以我们说要在<$ c $的对象上运行函数 test c> dff 指向 objdff

Functions work on objects, so we're saying that we're going to run the function test on the object that dff points to, which is objdff. This brings us to the function itself.

def test(df):

在这里,我们实质上是 = 函数。传递给测试函数 objdff 的对象由函数变量 df 指向。因此,现在 df ---> objdff dff ---> objdff

Here, we have what is essentially an = function. The object passed to the test function objdff is pointed to by the function variable df. So now df --->objdff and dff---> objdff

移至下一行: df = df.append(...)

让我们从 df.append(...)开始。 .append(...)传递到 objdff 上。这使对象 objdff 运行一个名为 append的函数。正如@Jai指出的那样, .append(...)方法使用 return 命令来完全输出附加了数据的新DataFrame。我们将新对象称为 objdff_apnd

Let's start with df.append(...). The .append(...) is passed onto the objdff. This makes the object objdff run a function called 'append'. As pointed out by @Jai, the .append(...) method uses a return command to output an entirely new DataFrame that has the data appended to it. We'll call the new object objdff_apnd.

现在我们可以进入 df = ... 部分。现在,我们基本上拥有的是 df = objdff_apnd 。现在,这非常简单。变量 df 现在指向对象 objdff_apnd

Now we can move onto the df = ... part. What we have now is essentially df = objdff_apnd. This is pretty simple now. The variable df now points to the object objdff_apnd.

在这一行的最后,我们有 df ---> objdff_apnd dff ---> objdff 。这就是问题所在。 dff 没有指向我们想要的对象( objdff_apnd )。

At the end of this line we have df ---> objdff_apnd and dff ---> objdff. This is where the problem lies. The object we want (objdff_apnd) is not being pointed to by dff.

因此,最后,变量 dff 仍指向 objdff ,而不是 objdff_apnd 。这将使我们进入脚本3(请参见下文)。

So at the end, the variable dff is still pointing to objdff, not to objdff_apnd. This brings us to Script 3 (see below).


import pandas as pd

def test(df):
    df['asdf'] = [1,2,3]
    return(df)

dff = pd.DataFrame()
test(dff)


就像脚本1一样, dff ---> objdff 。在 test(dff)期间,函数变量 df ---> objdff

Just like Script 1, dff ---> objdff. During test(dff), the function variable df ---> objdff. This is where things are different.

操作(?) df ['asdf'] = [1,2,3] 再次发送到基础对象 objdff 。上一次,这导致了一个新对象。但是,这次, [’asdf'] 操作直接编辑对象 objdff 。因此,对象 objdff 中有多余的 asdf列。

The operation (?) df['asdf'] = [1,2,3] again, is sent to the underlying object objdff. Last time, this resulted in a new object. This time however, the ['asdf'] operation directly edits the object objdff. So the object objdff has the extra 'asdf' column in it.

因此,最后我们有 df ---> objdff dff ---> objdff 。因此它们指向同一对象,这意味着变量 dff 指向已编辑的对象。

Therefore at the end we have df ---> objdff and dff ---> objdff. So they point to the same object, which means the variable dff points to the edited object.

一旦我们中断在函数外部,变量 dff 仍指向 objdff ,其中包含新数据。

Once we break outside of the function, variable dff still points to objdff, which has the new data in it. This gives us the desired result.


import pandas as pd

def test(df):
    df = df.append({'asdf':1, 'sdf':2}, ignore_index=True)
    return(df)

dff = pd.DataFrame()
dff = test(dff)


此脚本与脚本1完全相同,除了 dff = test(dff)

This script is exactly identical to Script 1, except for the dff = test(dff). We'll get to that in a second.

从脚本1的结尾继续,我们从函数 test(dff)开始停下来。 正在结束,我们有 dff ---> objdff df ---> objdff_apnd

Continuing from the end of Script 1, we left off right as the function test(dff) was ending, and we have dff ---> objdff and df ---> objdff_apnd.

函数 test 具有 return 命令,因此返回对象 objdff_apnd 。这会将行 dff = test(dff)转换为 dff = objdff_apnd

The function test has the return command, and so returns the object objdff_apnd. This turns the line dff = test(dff) into dff = objdff_apnd.

因此,最后,我们有 dff ---> objdff_apnd ,这正是我们想要的结果。

Therefore at the end, we have dff ---> objdff_apnd, which is exactly the result we want.

这篇关于在功能内附加DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆