如何重新采样直到满足特定日期标准 [英] How to resample until a specific date criteria is met

查看:47
本文介绍了如何重新采样直到满足特定日期标准的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在社区的一些帮助下,我设法使用了以下功能.上一个关于构建函数的问题我是试图找出如何使重新采样的日期运行到出现在任何代码的任一输入数据集中任何位置的最新日期.下面我包括了我得到的当前输出和我想要的输出.

输入数据:

输入 1 df1 - 输入

 日期代码数量0 2019-01-10 A 201 2019-01-10 乙 122 2019-01-10 C 103 2019-01-11 A 24 2019-01-11 乙 305 2019-01-11 中 27 2019-01-12 A 48 2019-01-12 乙 611 2019-01-13 A 1012 2019-01-13 乙 1213 2019-01-13 C 1

输入 2 df2 - 出站

 日期代码数量0 2019-01-11 A 51 2019-01-11 乙 12 2019-01-11 C 33 2019-01-12 A 1006 2019-01-13 B 17 2019-01-13 C 18 2019-01-15 A 19 2019-01-16 乙 1

现有代码:

from numba import njit@njitdef poscumsum(x):总计 = 0结果 = np.empty(x.shape)对于 enumerate(x) 中的 i, y:总计 += y如果总<0:总计 = 0结果[i] = 总计返回结果a = df1.set_index(['code', 'date'])b = df2.set_index(['code', 'date'])idx = a.index.union(b.index).sort_values()df3 = (a.reindex(idx,fill_value=0) - b.reindex(idx,fill_value=0))df3 = df3.groupby('code').resample('D', level='date').sum()df3['qty'] = df3.groupby('code')['qty'].transform(lambda g: poscumsum(g.values))

电流输出

每个代码仅代表它们出现在 In 或 Out dfs 中的日期.

 代码日期数量0 一 2019-01-10 201 A 2019-01-11 172 一 2019-01-12 03 一 2019-01-13 104 一 2019-01-14 105 一 2019-01-15 96 乙 2019-01-10 127 乙 2019-01-11 418 乙 2019-01-12 479 乙 2019-01-13 5810 乙 2019-01-14 5811 乙 2019-01-15 5812 乙 2019-01-16 5713 C 2019-01-10 1014 C 2019-01-11 915 C 2019-01-12 916 C 2019-01-13 9

期望输出:

每个代码代表 2019-01-10 和 2019-01-10 之间的每个日期2019-01-16

 代码日期数量0 一 2019-01-10 201 A 2019-01-11 172 一 2019-01-12 03 一 2019-01-13 104 一 2019-01-14 105 一 2019-01-15 96 一 2019-01-16 97 乙 2019-01-10 128 乙 2019-01-11 419 乙 2019-01-12 4710 乙 2019-01-13 5811 乙 2019-01-14 5812 乙 2019-01-15 5813 乙 2019-01-16 5714 C 2019-01-10 1015 C 2019-01-11 916 C 2019-01-12 917 C 2019-01-13 918 C 2019-01-14 919 C 2019-01-15 920℃ 2019-01-16 9

解决方案

好的,这里是 poscumsum 的 2D 版本(并概括为将运行总和限制在最小值和/或最大值):

@njitdef cumsum_ipped_2d(x, xmin=None, xmax=None):n, m = x.shape结果 = np.empty_like(x)如果 n == 0:返回结果总计 = np.zeros_like(x[0])对于范围(n)中的我:总计 += x[i]如果 xmin 不是 None:总计[总计

这是如何使用它(现在您希望所有日期都跨越同一时期);好消息是没有更多的 groupby(所以它比以往任何时候都快):

a = df1.pivot('date', 'code', 'qty')b = df2.pivot('日期', '代码', '数量')idx = a.index.union(b.index).sort_values()df3 = (a.reindex(idx, fill_value=0) - b.reindex(idx, fill_value=0)).resample('D').sum()df3.values[:, :] = cumsum_ipped_2d(df3.values, xmin=0)

或者,在两行(复杂)中:

df3 = (df1.set_index(['date', 'code']).subtract(df2.set_index(['date', 'code']), fill_value=0).unstack('code', fill_value=0).resample('D').sum())df3.values[:, :] = cumsum_ipped_2d(df3.values, xmin=0)

关于您的数据:

<预><代码>>>>df3代码 A B C日期2019-01-10 20.0 12.0 10.02019-01-11 17.0 41.0 9.02019-01-12 0.0 41.0 9.02019-01-13 0.0 52.0 9.02019-01-14 0.0 52.0 9.02019-01-15 0.0 52.0 9.02019-01-16 0.0 51.0 9.0

当然,你可以自由stack回到一个瘦的df,重新排序,删除索引等.例如,匹配你想要的输出:<预><代码>>>>df3.stack().swaplevel(0,1).sort_index().reset_index()代码日期数量0 一 2019-01-10 20.01 A 2019-01-11 17.02 A 2019-01-12 0.03 一 2019-01-13 10.04 A 2019-01-14 10.05 A 2019-01-15 9.06 一 2019-01-16 9.07 乙 2019-01-10 12.08 乙 2019-01-11 41.09 乙 2019-01-12 47.010 乙 2019-01-13 58.011 乙 2019-01-14 58.012 乙 2019-01-15 58.013 乙 2019-01-16 57.014 C 2019-01-10 10.015 C 2019-01-11 9.016 C 2019-01-12 9.017 C 2019-01-13 9.018 C 2019-01-14 9.019 C 2019-01-15 9.020℃ 2019-01-16 9.0

With some help from the community I have managed to get to the below function. previous question on building the functionI am trying to work out how to get the resampled date to run to the latest date that appears in anywhere in either of the input data sets for any code. Below I have included the current output I am getting and my desired output.

Input data:

Input 1 df1 - In

    date       code qty
0   2019-01-10  A   20
1   2019-01-10  B   12
2   2019-01-10  C   10
3   2019-01-11  A   2
4   2019-01-11  B   30
5   2019-01-11  C   2
7   2019-01-12  A   4
8   2019-01-12  B   6
11  2019-01-13  A   10
12  2019-01-13  B   12
13  2019-01-13  C   1

Input 2 df2 - Outbound

    date       code qty
0   2019-01-11  A   5
1   2019-01-11  B   1
2   2019-01-11  C   3
3   2019-01-12  A   100
6   2019-01-13  B   1
7   2019-01-13  C   1
8   2019-01-15  A   1
9   2019-01-16  B   1

Existing Code:

from numba import njit
@njit
def poscumsum(x):
    total = 0
    result = np.empty(x.shape)
    for i, y in enumerate(x):
        total += y
        if total < 0:
            total = 0
        result[i] = total
    return result

a = df1.set_index(['code', 'date'])
b = df2.set_index(['code', 'date'])
idx = a.index.union(b.index).sort_values()
df3 = (a.reindex(idx, fill_value=0) - b.reindex(idx, fill_value=0))
df3 = df3.groupby('code').resample('D', level='date').sum()
df3['qty'] = df3.groupby('code')['qty'].transform(
    lambda g: poscumsum(g.values))

Current Output

each code is only represented for dates on which they appear in the In or Out dfs.

      code  date        qty
    0   A   2019-01-10  20
    1   A   2019-01-11  17
    2   A   2019-01-12  0
    3   A   2019-01-13  10
    4   A   2019-01-14  10
    5   A   2019-01-15  9
    6   B   2019-01-10  12
    7   B   2019-01-11  41
    8   B   2019-01-12  47
    9   B   2019-01-13  58
    10  B   2019-01-14  58
    11  B   2019-01-15  58
    12  B   2019-01-16  57
    13  C   2019-01-10  10
    14  C   2019-01-11  9
    15  C   2019-01-12  9
    16  C   2019-01-13  9

Desired Output:

each code is represented for each date between 2019-01-10 & 2019-01-16

       code date        qty
    0   A   2019-01-10  20
    1   A   2019-01-11  17
    2   A   2019-01-12  0
    3   A   2019-01-13  10
    4   A   2019-01-14  10
    5   A   2019-01-15  9
    6   A   2019-01-16  9
    7   B   2019-01-10  12
    8   B   2019-01-11  41
    9   B   2019-01-12  47
    10  B   2019-01-13  58
    11  B   2019-01-14  58
    12  B   2019-01-15  58
    13  B   2019-01-16  57
    14  C   2019-01-10  10
    15  C   2019-01-11  9
    16  C   2019-01-12  9
    17  C   2019-01-13  9
    18  C   2019-01-14  9
    19  C   2019-01-15  9
    20  C   2019-01-16  9   

解决方案

Ok, here is a 2D version of poscumsum (and generalized to cap the running sum at min and/or max):

@njit
def cumsum_capped_2d(x, xmin=None, xmax=None):
    n, m = x.shape
    result = np.empty_like(x)
    if n == 0:
        return result
    total = np.zeros_like(x[0])
    for i in range(n):
        total += x[i]
        if xmin is not None:
            total[total < xmin] = xmin
        if xmax is not None:
            total[total > xmax] = xmax
        result[i] = total
    return result

And here is how to use it (now that you want all dates spanning the same period); the good news is that there is no more groupby (so it is faster than ever):

a = df1.pivot('date', 'code', 'qty')
b = df2.pivot('date', 'code', 'qty')
idx = a.index.union(b.index).sort_values()
df3 = (a.reindex(idx, fill_value=0) - b.reindex(idx, fill_value=0)).resample('D').sum()
df3.values[:, :] = cumsum_capped_2d(df3.values, xmin=0)

Or, in two (convoluted) lines:

df3 = (df1.set_index(['date', 'code']).subtract(df2.set_index(['date', 'code']), fill_value=0)
       .unstack('code', fill_value=0).resample('D').sum())
df3.values[:, :] = cumsum_capped_2d(df3.values, xmin=0)

On your data:

>>> df3
code           A     B     C
date                        
2019-01-10  20.0  12.0  10.0
2019-01-11  17.0  41.0   9.0
2019-01-12   0.0  41.0   9.0
2019-01-13   0.0  52.0   9.0
2019-01-14   0.0  52.0   9.0
2019-01-15   0.0  52.0   9.0
2019-01-16   0.0  51.0   9.0

Of course, you are free stack back into a skinny df, re-order, drop index, etc. For example, to match your desired output:

>>> df3.stack().swaplevel(0,1).sort_index().reset_index()
   code       date   qty
0     A 2019-01-10  20.0
1     A 2019-01-11  17.0
2     A 2019-01-12   0.0
3     A 2019-01-13  10.0
4     A 2019-01-14  10.0
5     A 2019-01-15   9.0
6     A 2019-01-16   9.0
7     B 2019-01-10  12.0
8     B 2019-01-11  41.0
9     B 2019-01-12  47.0
10    B 2019-01-13  58.0
11    B 2019-01-14  58.0
12    B 2019-01-15  58.0
13    B 2019-01-16  57.0
14    C 2019-01-10  10.0
15    C 2019-01-11   9.0
16    C 2019-01-12   9.0
17    C 2019-01-13   9.0
18    C 2019-01-14   9.0
19    C 2019-01-15   9.0
20    C 2019-01-16   9.0

这篇关于如何重新采样直到满足特定日期标准的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆