每天每小时计数记录并创建multindex DataFrame作为输出 [英] counting records on per hour, per day and create multindex DataFrame as output

查看:153
本文介绍了每天每小时计数记录并创建multindex DataFrame作为输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例数据框:

process_id | app_path | start_time

应基于start_time列中的日期和时间值对所需的输出数据帧进行多索引,并以唯一的日期作为索引的第一级,以小时的范围作为第二级的索引,每个时隙中的记录数应为计算

the desired output data frame should be multi-Indexed based on the date and time value in start_time column with unique dates as first level of index and one hour range as second level of index the count of records in each time slot should be calculated

def activity(self):
    # find unique dates from db file
    columns = self.df['start_time'].map(lambda x: x.date()).unique()

    result = pandas.DataFrame(np.zeros((1,len(columns))), columns = columns)
    for i in range(len(self.df)):
        col = self.df.iloc[i]['start_time'].date()
        result[col][0] = result.get_value(0, col) + 1

    return result

我尝试了上面的代码,其输出为:

I have tried the above code which gives the output as :

15-07-2014 16-7-2014 17-07-2014 18-07-2014 3217 2114 1027 3016 我也想按小时统计记录

15-07-2014 16-7-2014 17-07-2014 18-07-2014 3217 2114 1027 3016 I want to count records on per hour basis as well

推荐答案

以一些示例数据开始您的问题会很有帮助.既然您没有,我假设以下内容代表了您的数据(看起来好像没有使用app_path一样):

It would be helpful to start your question with some sample data. Since you didn't, I assumed the following is representative of your data (looks like app_path was not being used):

rng = pd.date_range('1/1/2011', periods=10000, freq='1Min')
df = pd.DataFrame(randint(size=len(rng), low=100, high = 500), index=rng)
df.columns = ['process_id']

看起来您可以从研究Pandas数据框中的groupby方法中受益.使用groupby,上面的示例变成了简单的单行代码:

It looks like you could benefit from exploring the groupby method in Pandas data frames. Using groupby, your example above become a simple one-liner:

df.groupby( [df.index.year, df.index.month, df.index.day] ).count()

按小时分组意味着将小时数简单地添加到该组中:

and grouping by hour means simply adding hour to the group:

df.groupby( [df.index.year, df.index.month, df.index.day, df.index.hour] ).count()

不要在Pandas中重新创建轮子,请使用提供的方法使代码更具可读性和更快的速度.

Don't recreate the wheel in Pandas, use the methods provided for much more readable, as well as faster, code.

这篇关于每天每小时计数记录并创建multindex DataFrame作为输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆