大 pandas - 如何过滤“最常见”日期时间对象 [英] pandas - how to filter "most frequent" Datetime objects

查看:92
本文介绍了大 pandas - 如何过滤“最常见”日期时间对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用如下的DataFrame:

  User_ID Datetime 
01 2014-01-01 08:00:00
01 2014-01-02 09:00:00
02 2014-01-02 10:00:00
02 2014-01-03 11:00:00
03 2014-01-04 12:00:00
04 2014-01-04 13:00:00
05 2014-01-02 14:00:00

我想在某些情况下基于Datetime列过滤用户,例如仅过滤一次/个月的用户,或仅在夏天出现的用户等。



到目前为止,我已经将df分组:

  g = df.groupby(['User_ID','Datetime'])。size()
/ pre>

在每个用户的时间获取痕迹:

  User_ID日期时间
01 2014-01-01 08:00:00
2014-01-02 09:00:00
02 2014-01-02 10:00:00
2014-01-03 11:00:00
03 2014-01-04 12:00:00
04 2014-01-04 13:00:00
05 2014-01 -02 14:00:00

然后我应用了一个掩码来过滤,例如,多个追踪:

  mask = df.groupby('User_ID')['Datetime']。apply(lambda g: len(g)> 1)
df = df [df ['User_ID']。isin(mask [mask] .index)]

所以这很好。我正在寻找一个功能,而不是像以前所说的,在不同条件下过滤用户的 lambda g:len(g)> 1 特别是过滤了一个月/个月的用户。

解决方案

只要你的'datetime'dtype已经是一个datetime,你正在运行大熊猫版本0.15.0或更高版本,然后你可以按月分组除了用户ID,然后通过检查组的长度过滤结果:

 在[29]中:

df.groupby(['User_ID',df ['Datetime']。dt.month])。filter(lambda x:len x)> 1)
Out [29]:
User_ID日期时间
0 1 2014-01-01 08:00:00
1 1 2014-01-02 09: 00:00
2 2 2014-01-02 10:00:00
3 2 2014-01-03 11:00:00


I'm working with a DataFrame like the following:

User_ID    Datetime
01    2014-01-01 08:00:00
01    2014-01-02 09:00:00
02    2014-01-02 10:00:00
02    2014-01-03 11:00:00
03    2014-01-04 12:00:00
04    2014-01-04 13:00:00
05    2014-01-02 14:00:00

I would like to filter Users under certain conditions based on the Datetime columns, e.g. filter only Users with one occurrence / month, or only Users with occurrences only in summer etc.

So far I've group the df with:

g = df.groupby(['User_ID','Datetime']).size()

obtaining the "traces" in time of each User:

User_ID    Datetime
01    2014-01-01 08:00:00
      2014-01-02 09:00:00
02    2014-01-02 10:00:00
      2014-01-03 11:00:00
03    2014-01-04 12:00:00
04    2014-01-04 13:00:00
05    2014-01-02 14:00:00

Then I applied a mask to filter, for instance, the Users with more than one trace:

mask = df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1)
df = df[df['User_ID'].isin(mask[mask].index)]

So this is fine. I'm looking for a function instead of the lambda g: len(g)>1 able to filter Users under different conditions, as I said before. In particular filter Users with with one occurrence / month.

解决方案

So long as your 'Datetime' dtype is already a datetime and you are running pandas version 0.15.0 or higher then you can groupby the month in addition to the user id and then filter the results by checking the length of the group:

In [29]:

df.groupby(['User_ID',df['Datetime'].dt.month]).filter(lambda x: len(x) > 1)
Out[29]:
   User_ID            Datetime
0        1 2014-01-01 08:00:00
1        1 2014-01-02 09:00:00
2        2 2014-01-02 10:00:00
3        2 2014-01-03 11:00:00

这篇关于大 pandas - 如何过滤“最常见”日期时间对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆