使用python pandas计算每日总和 [英] Calculate daily sums using python pandas

查看:313
本文介绍了使用python pandas计算每日总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用熊猫计算每日总价值.这是测试文件- http://pastebin.com/uSDfVkTS

I'm trying to calculate daily sums of values using pandas. Here's the test file - http://pastebin.com/uSDfVkTS

这是我到目前为止提出的代码:

This is the code I came up so far:

import numpy as np
import datetime as dt
import pandas as pd

f = np.genfromtxt('test', dtype=[('datetime', '|S16'), ('data', '<i4')], delimiter=',')
dates = [dt.datetime.strptime(i, '%Y-%m-%d %H:%M') for i in f['datetime']]
s = pd.Series(f['data'], index = dates)
d = s.resample('D', how='sum')

使用给定的测试文件生成:

Using the given test file this produces:

2012-01-02    1128
Freq: D

第一个问题是所计算的总和对应于第二天.我已经能够通过使用参数loffset ='-1d'解决这个问题.

First problem is that calculated sum corresponds to the next day. I've been able to solve that by using parameter loffset='-1d'.

现在实际的问题是数据可能不是从一天的00:30开始,而是在一天的任何时间开始.此外,数据的空白处也填充了"nan"值.

Now the actual problem is that the data may start not from 00:30 of a day but at any time of a day. Also the data has gaps filled with 'nan' values.

也就是说,是否可以将计算每日总和所需的值的数量设置为较低的阈值? (例如,如果一天中少于40个值,则用NaN代替总和)

That said, is it possible to set a lower threshold of number of values that are necessary to calculate daily sums? (e.g. if there're less than 40 values in a single day, then put NaN instead of a sum)

我相信可以定义一个自定义函数来做到这一点,并在'how'参数中对其进行引用,但是我不知道如何对函数本身进行编码.

I believe that it is possible to define a custom function to do that and refer to it in 'how' parameter, but I have no clue how to code the function itself.

推荐答案

您可以直接在Pandas中进行操作:

You can do it directly in Pandas:

s = pd.read_csv('test', header=None, index_col=0, parse_dates=True)
d = s.groupby(lambda x: x.date()).aggregate(lambda x: sum(x) if len(x) >= 40 else np.nan)

             X.2
2012-01-01  1128

这篇关于使用python pandas计算每日总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆