在python中在一小时内计算不同的动作 [英] Count different actions within one hour in python
问题描述
我开始处理时间序列.我有一个用户在不同国家进行银行转账,但是他/她进行转账的最频繁的国家是X,但也有到Y和Z国家的转账.让我们说:
I am starting to work with time series. I have one of a user doing bank transfers to different countries, however the most frequent country to where he/she is doing the transfers is X, but there are transfers also to the countries Y and Z. Let's say:
date id country
2020-01-01T00:00:00.000Z id_01 X
2020-01-01T00:20:00.000Z id_02 X
2020-01-01T00:25:00.000Z id_03 Y
2020-01-01T00:35:00.000Z id_04 X
2020-01-01T00:45:00.000Z id_05 Z
2020-01-01T01:00:00.000Z id_06 X
2020-01-01T10:20:00.000Z id_07 X
2020-01-01T10:25:00.000Z id_08 X
2020-01-01T13:00:00.000Z id_09 X
2020-01-01T18:45:00.000Z id_10 Z
2020-01-01T18:55:00.000Z id_11 X
由于最频繁的国家是X,所以我想反复计算一小时内(在整个事件列表中)与X不同的国家进行了多少笔交易.
Since the most frequent country is X, I would like to count iteratively how many transactions have been done within one hour (in the whole list of events) to countries different than X.
此特定情况下预期输出的格式为:
The format of the expected output for this particular case would be:
date id country
2020-01-01T00:25:00.000Z id_03 Y
2020-01-01T00:45:00.000Z id_05 Z
从 2020-01-01T00:00:00.000Z
开始,在一小时内有两次Y,Z交易.然后从 2020-01-01T00:20:00.000Z
开始,在一小时内,有相同的交易,依此类推.然后,从 2020-01-01T10:20:00.000Z
开始,在一小时内,所有内容都是X.从 2020-01-01T18:45:00.000Z
开始,一小时之内只有一个Z.
Starting from 2020-01-01T00:00:00.000Z
, within one hour there are two Y, Z transactions. Then starting from 2020-01-01T00:20:00.000Z
, within one hour, there are the same transactions, and so on. Then, starting from 2020-01-01T10:20:00.000Z
, within one hour, all are X. Starting from 2020-01-01T18:45:00.000Z
, within one hour, there is only one Z.
我正在尝试使用double for循环和.value_counts(),但是我不确定自己在做什么.
I am trying with a double for loop and .value_counts(), but I'm not sure of what I am doing.
推荐答案
IIUC,您只能选择非X的行,然后使用 diff
向前和向后一次(前后1小时内)),并且您希望两个差异中的任何一个都位于 Timedelta
1h以下.
IIUC, you can select only the rows not X, then use diff
once forward and once backward (within 1 hour before and after) and you want where any of the two diff is below a Timedelta
of 1h.
#convert to datetime
df['date'] = pd.to_datetime(df['date'])
#mask not X and select only these rows
mX = df['country'].ne('X')
df_ = df[mX].copy()
# mask within an hour before and after
m1H = (df_['date'].diff().le(pd.Timedelta(hours=1)) |
df_['date'].diff(-1).le(pd.Timedelta(hours=1)) )
# selet only the rows meeting criteria on X and 1H
df_ = df_[m1H]
print (df_)
date id country
2 2020-01-01 00:25:00+00:00 id_03 Y
4 2020-01-01 00:45:00+00:00 id_05 Z
这篇关于在python中在一小时内计算不同的动作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!