总结每天大 pandas 的发生次数 [英] summing the number of occurrences per day pandas

查看:152
本文介绍了总结每天大 pandas 的发生次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 分数
时间戳
2013 -06-29 00:52:28 + 00:00 -0.420070
2013-06-29 00:51:53 + 00:00 -0.445720
2013-06-28 16:40:43+ 00:00 0.508161
2013-06-28 15:10:3​​0 + 00:00 0.921474
2013-06-28 15:10:17 + 00:00 0.876710

我需要获得测量次数,发生这样的事情,所以我正在寻找这样的东西$ /

  count 
timestamp
2013-06-29 2
2013-06-28 3

我不在乎情绪栏我想要每天的事件计数。

解决方案

如果您的 timestamp index是一个 DatetimeIndex

  import io 
import pandas as pd
content =' ''\
时间戳记
2013-06-29 00:52:28 + 00:00 -0.420070
2013-06-29 00:51:53 + 00:00 -0.445720
2013-06-28 16:40:43 + 00:00 0.508161
2013-06-28 15:10:3​​0 + 00:00 0.921474
2013-06-28 15:10 :17 + 00:00 0.876710
''

df = pd.read_table(io.BytesIO(content),sep ='\s {2,}',parse_dates = [ 0],index_col = [0])

print(df)

所以 df 看起来像这样:

  score 
timestamp
2013-06-29 00:52:28 -0.420070
2013-06-29 00:51:53 -0.445720
2013-06-28 16:40:43 0.508161
2013-06-28 15:10:30 0.921474
2013-06-28 15:10:17 0.876710

print(df.index)
#< class 'pandas.tseries.index.DatetimeIndex'>

您可以使用:

  print(df.groupby(df.index.date).count())

 分数
2013-06-28 3
2013-06-29 2






请注意 parse_dates 参数。没有它,索引将只是一个 pandas.core.index.Index 对象。在这种情况下,您不能使用 df.index.date



所以答案取决于 type(df.index),您尚未显示...


I have a data set like so in a pandas dataframe.

                                  score
timestamp                                 
2013-06-29 00:52:28+00:00        -0.420070
2013-06-29 00:51:53+00:00        -0.445720
2013-06-28 16:40:43+00:00         0.508161
2013-06-28 15:10:30+00:00         0.921474
2013-06-28 15:10:17+00:00         0.876710

I need to get counts for the number of measurements, that occur so I am looking for something like this

                                    count
   timestamp
   2013-06-29                       2
   2013-06-28                       3

I dont not care about the sentiment column i want the count of the occurrences per day.

解决方案

If your timestamp index is a DatetimeIndex:

import io
import pandas as pd
content = '''\
timestamp  score
2013-06-29 00:52:28+00:00        -0.420070
2013-06-29 00:51:53+00:00        -0.445720
2013-06-28 16:40:43+00:00         0.508161
2013-06-28 15:10:30+00:00         0.921474
2013-06-28 15:10:17+00:00         0.876710
'''

df = pd.read_table(io.BytesIO(content), sep='\s{2,}', parse_dates=[0], index_col=[0])

print(df)

so df looks like this:

                        score
timestamp                    
2013-06-29 00:52:28 -0.420070
2013-06-29 00:51:53 -0.445720
2013-06-28 16:40:43  0.508161
2013-06-28 15:10:30  0.921474
2013-06-28 15:10:17  0.876710

print(df.index)
# <class 'pandas.tseries.index.DatetimeIndex'>

You can use:

print(df.groupby(df.index.date).count())

which yields

            score
2013-06-28      3
2013-06-29      2


Note the importance of the parse_dates parameter. Without it, the index would just be a pandas.core.index.Index object. In which case you could not use df.index.date.

So the answer depends on the type(df.index), which you have not shown...

这篇关于总结每天大 pandas 的发生次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆