使用apply +函数为pandas数据框创建多个新列 [英] Create multiple new columns for pandas dataframe with apply + function

查看:397
本文介绍了使用apply +函数为pandas数据框创建多个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的熊猫数据框df具有以下形状:(763, 65)

I have a pandas dataframe df of the following shape: (763, 65)

我使用以下代码创建4个新列:

I use the following code to create 4 new columns:

df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1)

def myFunc(row):
    #code to get some result from another dataframe
    return result1, result2, result3, result4

myFunc中返回的数据框的形状为(1, 4).代码遇到以下错误:

The shape of the dataframe which is returned in myFunc is (1, 4). The code runs into the following error:

ValueError:传递的值的形状为(763,4),索引暗示(763,65)

ValueError: Shape of passed values is (763, 4), indices imply (763, 65)

我知道df有65列,并且从myFunc返回的数据只有4列.但是,我只想创建4个新列(即col1col2等),所以我认为当在myFunc中仅返回4列时代码是正确的.我在做什么错了?

I know that df has 65 columns and that the returned data from myFunc only has 4 columns. However, I only want to create the 4 new columns (that is, col1, col2, etc.), so in my opinion the code is correct when it only returns 4 columns in myFunc. What am I doing wrong?

推荐答案

演示:

In [40]: df = pd.DataFrame({'a':[1,2,3]})

In [41]: df
Out[41]:
   a
0  1
1  2
2  3

In [42]: def myFunc(row):
    ...:     #code to get some result from another dataframe
    ...:     # NOTE: trick is to return pd.Series()
    ...:     return pd.Series([1,2,3,4]) * row['a']
    ...:

In [44]: df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1)

In [45]: df
Out[45]:
   a  col1  col2  col3  col4
0  1     1     2     3     4
1  2     2     4     6     8
2  3     3     6     9    12

免责声明::尽量避免使用.apply(..., axis=1)-因为它是引擎盖下的for loop-即它没有进行矢量化处理,与矢量化熊猫相比,其运行速度 要慢得多/脾气暴躁的功能.

Disclaimer: try to avoid using .apply(..., axis=1) - as it's a for loop under the hood - i.e. it's not vectoried and will work much slower compared to vectorized Pandas/Numpy ufuncs.

PS,如果您要提供myFunc函数中要计算的内容的详细信息,那么我们可以尝试找到向量化的解决方案...

PS if you would provide details of what you are trying to calculate in the myFunc functuion, then we could try to find a vectorized solution...

这篇关于使用apply +函数为pandas数据框创建多个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆