pandas 找到满足条件的日期之间的持续时间? [英] Pandas find duration between dates where a condition is met?
本文介绍了 pandas 找到满足条件的日期之间的持续时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个看起来像这样的pandas DataFrame:
I have a pandas DataFrame that looks like this:
╔═══╦════════════╦═════════════╗
║ ║ VENDOR ID ║ DATE ║
╠═══╬════════════╬═════════════╣
║ 1 ║ 33 ║ 01/12/2018 ║
║ 2 ║ 33 ║ 03/12/2018 ║
║ 3 ║ 12 ║ 01/08/2018 ║
║ 4 ║ 12 ║ 01/15/2018 ║
║ 5 ║ 12 ║ 01/23/2018 ║
║ 6 ║ 33 ║ 05/12/2018 ║
║ 7 ║ 89 ║ 01/12/2018 ║
╚═══╩════════════╩═════════════╝
我希望得到一张能给我数字的表自上次出现相同的VENDOR ID以来的天数,例如:
And I'm hoping to get a table that gives me the number of days since the same VENDOR ID last occured, like so:
╔═══╦════════════╦═════════════╗
║ ║ VENDOR ID ║ GAP ║
╠═══╬════════════╬═════════════╣
║ 1 ║ 33 ║ ---------- ║
║ 2 ║ 33 ║ 60 ║
║ 3 ║ 12 ║ ---------- ║
║ 4 ║ 12 ║ 7 ║
║ 5 ║ 12 ║ 8 ║
║ 6 ║ 33 ║ 60 ║
║ 7 ║ 89 ║ ---------- ║
╚═══╩════════════╩═════════════╝
我一直在尝试找到一种使用groupbys和其他技巧,但似乎无法解决任何问题。
I've been trying to find a way to achieve this using groupbys and other tricks but can't seem to get anything to work.
我确实提出了我认为可以使用2个嵌套的for循环或在熊猫中进行迭代的方法,但是由于
I did come up with what I think might work using 2 nested for loops or iterrrows in pandas but because of the size of my dataset using nested loops won't really work.
任何人有任何想法吗?
推荐答案
我得到一些不同的输出:
I get a bit different output:
df['DATE'] = pd.to_datetime(df['DATE'])
df['GAP'] = df.groupby('VENDOR ID')['DATE'].diff().dt.days
print (df)
VENDOR ID DATE GAP
1 33 2018-01-12 NaN
2 33 2018-03-12 59.0
3 12 2018-01-08 NaN
4 12 2018-01-15 7.0
5 12 2018-01-23 8.0
6 33 2018-05-12 61.0
7 89 2018-01-12 NaN
E xplanation :
- 转换列
to_datetime
- 然后
groupby
带有> diff
- 最后将
timedeltas
s转换为天
- Convert column
to_datetime
- Then
groupby
withdiff
- Last convert
timedeltas
s todays
这篇关于 pandas 找到满足条件的日期之间的持续时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文