在python中在一小时内计算不同的动作 [英] Count different actions within one hour in python

查看:57
本文介绍了在python中在一小时内计算不同的动作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始处理时间序列.我有一个用户在不同国家进行银行转账,但是他/她进行转账的最频繁的国家是X,但也有到Y和Z国家的转账.让我们说:

I am starting to work with time series. I have one of a user doing bank transfers to different countries, however the most frequent country to where he/she is doing the transfers is X, but there are transfers also to the countries Y and Z. Let's say:

date                           id       country
2020-01-01T00:00:00.000Z       id_01     X
2020-01-01T00:20:00.000Z       id_02     X
2020-01-01T00:25:00.000Z       id_03     Y
2020-01-01T00:35:00.000Z       id_04     X
2020-01-01T00:45:00.000Z       id_05     Z
2020-01-01T01:00:00.000Z       id_06     X
2020-01-01T10:20:00.000Z       id_07     X
2020-01-01T10:25:00.000Z       id_08     X
2020-01-01T13:00:00.000Z       id_09     X
2020-01-01T18:45:00.000Z       id_10     Z
2020-01-01T18:55:00.000Z       id_11     X

由于最频繁的国家是X,所以我想反复计算一小时内(在整个事件列表中)与X不同的国家进行了多少笔交易.

Since the most frequent country is X, I would like to count iteratively how many transactions have been done within one hour (in the whole list of events) to countries different than X.

此特定情况下预期输出的格式为:

The format of the expected output for this particular case would be:

date                           id        country
2020-01-01T00:25:00.000Z       id_03     Y
2020-01-01T00:45:00.000Z       id_05     Z

2020-01-01T00:00:00.000Z 开始,在一小时内有两次Y,Z交易.然后从 2020-01-01T00:20:00.000Z 开始,在一小时内,有相同的交易,依此类推.然后,从 2020-01-01T10:20:00.000Z 开始,在一小时内,所有内容都是X.从 2020-01-01T18:45:00.000Z 开始,一小时之内只有一个Z.

Starting from 2020-01-01T00:00:00.000Z, within one hour there are two Y, Z transactions. Then starting from 2020-01-01T00:20:00.000Z, within one hour, there are the same transactions, and so on. Then, starting from 2020-01-01T10:20:00.000Z, within one hour, all are X. Starting from 2020-01-01T18:45:00.000Z, within one hour, there is only one Z.

我正在尝试使用double for循环和.value_counts(),但是我不确定自己在做什么.

I am trying with a double for loop and .value_counts(), but I'm not sure of what I am doing.

推荐答案

IIUC,您只能选择非X的行,然后使用 diff 向前和向后一次(前后1小时内)),并且您希望两个差异中的任何一个都位于 Timedelta 1h以下.

IIUC, you can select only the rows not X, then use diff once forward and once backward (within 1 hour before and after) and you want where any of the two diff is below a Timedelta of 1h.

#convert to datetime
df['date'] = pd.to_datetime(df['date'])

#mask not X and select only these rows
mX = df['country'].ne('X')
df_ = df[mX].copy()

# mask within an hour before and after 
m1H = (df_['date'].diff().le(pd.Timedelta(hours=1)) | 
        df_['date'].diff(-1).le(pd.Timedelta(hours=1)) )

# selet only the rows meeting criteria on X and 1H
df_ = df_[m1H]
print (df_)
                       date     id country
2 2020-01-01 00:25:00+00:00  id_03       Y
4 2020-01-01 00:45:00+00:00  id_05       Z

这篇关于在python中在一小时内计算不同的动作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆