从功能向 pandas 数据框中添加多列 [英] Add Multiple Columns to Pandas Dataframe from Function

查看:56
本文介绍了从功能向 pandas 数据框中添加多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框mydf,它有两列,这两列都是日期时间数据类型:mydatemytime.我想再添加三列:hourweekdayweeknum.

I have a pandas data frame mydf that has two columns,and both columns are datetime datatypes: mydate and mytime. I want to add three more columns: hour, weekday, and weeknum.

def getH(t): #gives the hour
    return t.hour
def getW(d): #gives the week number
    return d.isocalendar()[1] 
def getD(d): #gives the weekday
    return d.weekday() # 0 for Monday, 6 for Sunday

mydf["hour"] = mydf.apply(lambda row:getH(row["mytime"]), axis=1)
mydf["weekday"] = mydf.apply(lambda row:getD(row["mydate"]), axis=1)
mydf["weeknum"] = mydf.apply(lambda row:getW(row["mydate"]), axis=1)

该代码段有效,但是由于它在数据帧中循环至少3次,因此计算效率不高.我只想知道是否有更快和/或更理想的方法来做到这一点.例如,使用zip还是merge?例如,如果我仅创建一个返回三个元素的函数,该如何实现呢?为了说明这一点,该函数将是:

The snippet works, but it's not computationally efficient as it loops through the data frame at least three times. I would just like to know if there's a faster and/or more optimal way to do this. For example, using zip or merge? If, for example, I just create one function that returns three elements, how should I implement this? To illustrate, the function would be:

def getHWd(d,t):
    return t.hour, d.isocalendar()[1], d.weekday()

推荐答案

以下是使用一个apply

说,df就像

In [64]: df
Out[64]:
       mydate     mytime
0  2011-01-01 2011-11-14
1  2011-01-02 2011-11-15
2  2011-01-03 2011-11-16
3  2011-01-04 2011-11-17
4  2011-01-05 2011-11-18
5  2011-01-06 2011-11-19
6  2011-01-07 2011-11-20
7  2011-01-08 2011-11-21
8  2011-01-09 2011-11-22
9  2011-01-10 2011-11-23
10 2011-01-11 2011-11-24
11 2011-01-12 2011-11-25

为了便于阅读,我们将lambda函数移到了单独的行,并像这样定义它

We'll take the lambda function out to separate line for readability and define it like

In [65]: lambdafunc = lambda x: pd.Series([x['mytime'].hour,
                                           x['mydate'].isocalendar()[1],
                                           x['mydate'].weekday()])

然后,apply并将结果存储到df[['hour', 'weekday', 'weeknum']]

And, apply and store the result to df[['hour', 'weekday', 'weeknum']]

In [66]: df[['hour', 'weekday', 'weeknum']] = df.apply(lambdafunc, axis=1)

然后,输出就像

In [67]: df
Out[67]:
       mydate     mytime  hour  weekday  weeknum
0  2011-01-01 2011-11-14     0       52        5
1  2011-01-02 2011-11-15     0       52        6
2  2011-01-03 2011-11-16     0        1        0
3  2011-01-04 2011-11-17     0        1        1
4  2011-01-05 2011-11-18     0        1        2
5  2011-01-06 2011-11-19     0        1        3
6  2011-01-07 2011-11-20     0        1        4
7  2011-01-08 2011-11-21     0        1        5
8  2011-01-09 2011-11-22     0        1        6
9  2011-01-10 2011-11-23     0        2        0
10 2011-01-11 2011-11-24     0        2        1
11 2011-01-12 2011-11-25     0        2        2

这篇关于从功能向 pandas 数据框中添加多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆