Pandas 1.1.0 apply 函数在原地改变行 [英] Pandas 1.1.0 apply function is altering the row in place

查看:69
本文介绍了Pandas 1.1.0 apply 函数在原地改变行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个小的 DF(2 行 x 4 列).还有一个函数,一旦 apply 被执行,它会根据一些逻辑添加一个额外的列.使用 Pandas 0.24.2 我一直在这样做 df.apply(func, axis=1) 并且我会得到我的额外列.到目前为止,一切都很好.

I have a small DF (2rows x 4cols). And a function that will add an extra column depending on some logic, once the apply is performed. With Pandas 0.24.2 I've been doing this as df.apply(func, axis=1) and I would get my extra column. So far, so good.

现在使用 Pandas 1.1.0 会发生一些奇怪的事情:当我 apply 时,第一行被处理两次,第二行甚至不被考虑.

Now with Pandas 1.1.0 something weird happens: when I apply, the first row is processed twice, and the second row is not even considered.

我将展示原始DF、预期的DF和函数.我添加了一个 print(row) 这样你就可以看到 DF 的第一个 row 在这个过程中是如何重复的.

I will show the original DF, the expected one, and the function. I added a print(row) so you can see how the first row of the DF is repeated in the process.

In [82]: df_attr_list                                                                                                                                                                                                                        
Out[82]: 
      name attrName string_value dict_value
0  FW12611  HW type         None       ALU1
1  FW12612  HW type         None       ALU1

现在,函数及其输出...

Now, the function, and its output ...

def setFinalValue(row):
    rtrName      = row['name']
    attrName     = row['attrName'].replace(" ","")
    dict_value   = row['dict_value']
    string_value = row['string_value']
    finalValue   = 'N/A'

    if attrName in ['Val1','Val2','Val3']:
        finalValue = dict_value
    elif attrName in ['Val4','Val5',]:
        finalValue = string_value
    else:
        finalValue = "N/A"
    row['finalValue'] = finalValue

    print(row)
    
    return row

现在,apply 之后的输出...

Now, the output after the apply ...

In [83]: df_attr_list.apply(setFinalValue, axis=1)                                                                                                                                                                                           
name                       FW12611
attrName                   HW type
string_value                  None
dict_value                    ALU1
finalValue                    ALU1
Name: 0, dtype: object
name                       FW12611
attrName                   HW type
string_value                  None
dict_value                    ALU1
finalValue                    ALU1
Name: 1, dtype: object
Out[83]: 
      name attrName string_value dict_value finalValue
0  FW12611  HW type         None       ALU1       ALU1
1  FW12611  HW type         None       ALU1       ALU1

如您所见,添加了额外的列,但原始DF的第一行处理了两次,好像第二行不存在...

As you can see, the extra column is added, but the first row of the original DF is processed twice, as if the second didn't exist ...

为什么会这样?

我已经在 pandas 1.1.0 中尝试了这个......

I'm already trying this out with pandas 1.1.0...

In [86]: print(pd.__version__)                                                                                                                                                                                                               
1.1.0

谢谢!

推荐答案

  • 根据 Pandas 1.1.0 What's New Doc: apply 和 applymap on DataFrame 只计算第一行/列一次.apply 不计算第一行两次.
  • 问题是,当返回 row 时,数据框被替换.
    • 这似乎是 BUG:DataFrame.apply with func altering row in-地方#35633
      • 另请参阅分支 1.1.x 上的 Backport PR #35633(错误:DataFrame.apply使用 func 就地更改行)#35666
        • As per Pandas 1.1.0 What's New Doc: apply and applymap on DataFrame evaluates first row/column only once, .apply does not evaluate the first row twice.
        • The issue is, the dataframe is replaced when row is returned.
          • This seems to be a result of BUG: DataFrame.apply with func altering row in-place #35633
            • Also see Backport PR #35633 on branch 1.1.x (BUG: DataFrame.apply with func altering row in-place) #35666
            • import pandas as pd
              
              data = {'name': ['FW12611', 'FW12612', 'FW12613'],
               'attrName': ['HW type', 'HW type', 'HW type'],
               'string_value': ['None', 'None', 'None'],
               'dict_value': ['ALU1', 'ALU1', 'ALU1']}
              
              df = pd.DataFrame(data)
              
              
              def setFinalValue(row):
                  print(row)
                  rtrName      = row['name']
                  attrName     = row['attrName'].replace(" ","")
                  dict_value   = row['dict_value']
                  string_value = row['string_value']
                  finalValue   = 'N/A'
              
                  if attrName in ['Val1','Val2','Val3']:
                      finalValue = dict_value
                  elif attrName in ['Val4','Val5',]:
                      finalValue = string_value
                  else:
                      finalValue = "N/A"
              
                  print('\n')
                  return finalValue
              
              
              # apply the function
              df['finalValue'] = df.apply(setFinalValue, axis=1)
              
              [out]:
              name            FW12611
              attrName        HW type
              string_value       None
              dict_value         ALU1
              Name: 0, dtype: object
              
              
              name            FW12612
              attrName        HW type
              string_value       None
              dict_value         ALU1
              Name: 1, dtype: object
              
              
              name            FW12613
              attrName        HW type
              string_value       None
              dict_value         ALU1
              Name: 2, dtype: object
              
              # display(df)
                    name attrName string_value dict_value finalValue
              0  FW12611  HW type         None       ALU1        N/A
              1  FW12612  HW type         None       ALU1        N/A
              2  FW12613  HW type         None       ALU1        N/A
              

              这篇关于Pandas 1.1.0 apply 函数在原地改变行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆