如何使用if语句使用pandas添加新列? [英] How to use pandas to add new column using if statement?

查看:324
本文介绍了如何使用if语句使用pandas添加新列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您能帮我在python pandas中编写以下概念吗,我有以下数据类型:

Could you kindly help me to write the following concept in python pandas, I have the following datatype:

id=["Train A","Train A","Train A","Train B","Train B","Train B"]
start = ["A","B","C","D","E","F"]
end = ["G","H","I","J","K","L"]
arrival_time = ["0"," 2016-05-19 13:50:00","2016-05-19 21:25:00","0","2016-05-24 18:30:00","2016-05-26 12:15:00"]
departure_time = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:25:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"]
capacity = ["2","2","3","3","2","3"]

要获取以下数据:

id         arrival_time         departure_time         start  end  capacity

Train A          0                  2016-05-19 08:25:00   A     G    2
Train A   2016-05-19 13:50:00       2016-05-19 16:00:00   B     H    2
Train A   2016-05-19 21:25:00       2016-05-20 07:25:00   C     I    3
Train B          0                  2016-05-24 12:50:00   D     J    3
Train B   2016-05-24 18:30:00       2016-05-25 20:00:00   E     K    2
Train B   2016-05-26 12:15:00       2016-05-26 19:45:00   F     L    3

我想添加一列称为源和接收器,并且如果到达和离开之间的时间差小于3小时,则源是旅行的起点,而接收器仅在旅行中断时(即,当time_difference时)超过3小时,

I would like to add a column called source and sink and if the time difference between arrival and departure is less than 3 hours, the source is the starting of the trip and the sink is only when the trip breaks (ie when time_difference is more than 3 hours,

time difference   source     sink
     -              A         H
     02:10:00       A         H
     10:00:00       C         I
     -              D         K
     01:30:00       D         K
     19:30:00       F         L

推荐答案

df = df.assign(timediff=(df.departure_time - df.arrival_time))

df = df.assign(source = np.where(df.timediff.dt.seconds / 3600 < 3, df.shift(1).start, df.start))

df = df.assign(sink = np.where(df.timediff.dt.seconds.shift(1) / 3600 > 3, df.shift(-1).end, df.end))

print(df)

输出:

        id        arrival_time      departure_time start end  capacity sink  \
0  Train A                 NaT 2016-05-19 08:25:00     A   G         2    G   
1  Train A 2016-05-19 13:50:00 2016-05-19 16:00:00     B   H         2    H   
2  Train A 2016-05-19 21:25:00 2016-05-20 07:25:00     C   I         3    I   
3  Train B                 NaT 2016-05-24 12:50:00     D   J         3    K   
4  Train B 2016-05-24 18:30:00 2016-05-25 20:00:00     E   K         2    K   
5  Train B 2016-05-26 12:15:00 2016-05-26 19:45:00     F   L         3    L   

         timediff source  
0             NaT      A  
1 0 days 02:10:00      A  
2 0 days 10:00:00      C  
3             NaT      D  
4 1 days 01:30:00      D  
5 0 days 07:30:00      F

这篇关于如何使用if语句使用pandas添加新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆