使用 apply + 函数为 Pandas 数据框创建多个新列 [英] Create multiple new columns for pandas dataframe with apply + function

查看:36
本文介绍了使用 apply + 函数为 Pandas 数据框创建多个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下形状的熊猫数据框 df:(763, 65)

I have a pandas dataframe df of the following shape: (763, 65)

我使用以下代码创建了 4 个新列:

I use the following code to create 4 new columns:

df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1)

def myFunc(row):
    #code to get some result from another dataframe
    return result1, result2, result3, result4

myFunc 中返回的数据帧的形状是 (1, 4).代码运行出现以下错误:

The shape of the dataframe which is returned in myFunc is (1, 4). The code runs into the following error:

ValueError: 传递值的形状是 (763, 4),索引意味着 (763, 65)

ValueError: Shape of passed values is (763, 4), indices imply (763, 65)

我知道 df 有 65 列,而从 myFunc 返回的数据只有 4 列.但是,我只想创建 4 个新列(即 col1col2 等),因此在我看来,代码仅返回 4 时是正确的myFunc 中的列.我做错了什么?

I know that df has 65 columns and that the returned data from myFunc only has 4 columns. However, I only want to create the 4 new columns (that is, col1, col2, etc.), so in my opinion the code is correct when it only returns 4 columns in myFunc. What am I doing wrong?

推荐答案

Demo:

In [40]: df = pd.DataFrame({'a':[1,2,3]})

In [41]: df
Out[41]:
   a
0  1
1  2
2  3

In [42]: def myFunc(row):
    ...:     #code to get some result from another dataframe
    ...:     # NOTE: trick is to return pd.Series()
    ...:     return pd.Series([1,2,3,4]) * row['a']
    ...:

In [44]: df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1)

In [45]: df
Out[45]:
   a  col1  col2  col3  col4
0  1     1     2     3     4
1  2     2     4     6     8
2  3     3     6     9    12

免责声明:尽量避免使用 .apply(..., axis=1) - 因为它是一个 for 循环 在引擎盖下- 即它不是矢量化的,并且与矢量化的 Pandas/Numpy ufuncs 相比,它的运行速度会慢得多.

Disclaimer: try to avoid using .apply(..., axis=1) - as it's a for loop under the hood - i.e. it's not vectoried and will work much slower compared to vectorized Pandas/Numpy ufuncs.

PS 如果您能在 myFunc 函数中提供您尝试计算的详细信息,那么我们可以尝试找到矢量化解决方案...

PS if you would provide details of what you are trying to calculate in the myFunc functuion, then we could try to find a vectorized solution...

这篇关于使用 apply + 函数为 Pandas 数据框创建多个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆