一定数量的时间段内的 pandas 存取款 [英] Pandas deposits and withdrawals over a time period with n-number of people

查看:58
本文介绍了一定数量的时间段内的 pandas 存取款的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试动态建立一种格式,在该格式中,我要显示时间轴图表中的存款数量与提款数量的比较。每当存入资金时,图表就会上升,而提款完成后图表就会下降。


这就是我走的距离:

  df.head()

名称存款取款

Peter 2019-03-07 2019-03-11
彼得2019-03-08 2019-03-19
彼得2019-03-12 2019-05-22
彼得2019-03-12 2019-10-31
彼得2019- 03-14 2019-04-05

这是显示一个人的净运动的数据操纵;彼得。

  x = pd.Series(df.groupby('Deposits')。size())
y = pd.Series( df.groupby('Withdrawals')。size())
余额= pd.DataFrame({'net_mov':x.sub(y,fill_value = 0)})
余额= balance.assign( Peter = balance.net_mov.cumsum())

打印(余额)

net_mov彼得
2019-03-07 1 1
2019-03 -08 1 2
2019-03-11 -1 1
2019-03-12 2 3
2019-03-14 1 4

这很好用,这是我想要的格式。现在,我想进一步说明这一点,不仅要列出Peters的存款和取款,还想增加n个人数。假设我的数据框看起来像这样:

  
df2.head()

名称存款取款

彼得2019-03-07 2019-03-11
安娜2019-03-08 2019-03-19
安娜2019-03-12 2019-05-22
彼得2019-03-12 2019-10-31
西蒙2019-03-14 2019-04-05

我想要的格式是这个。我不知道如何对所有内容进行分组,也不知道会预先命名哪些名称或多少列,因此我无法对名称或列数进行硬编码。它必须动态生成。

  net_mov1彼得net_mov2安娜net_mov3西蒙
2019-03-07 1 1 1 1 2 2
2019-03-08 1 2 2 3 -1 1
2019-03-11 -1 1 0 3 2 3
2019-03-12 2 3 -2 1 4 7
2019-03-14 1 4 3 4 -1 6

更新:


首先,感谢您的帮助。我离目标越来越近。这是进度:

  x = pd.Series(df.groupby(['Created','name'])。size()) 
y = pd.Series(df.groupby(['Finished','name'])。size())
余额= pd.DataFrame({'net_mov':x.sub(y,fill_value = 0)})
balance = balance.assign(balance = balance.groupby('name')。net_mov.cumsum())

balance_byname = balance.groupby('name')
balance_byname.get_group( Peter)

输出:

net_mov余额
名称创建完成
Peter 2017-07-03 2017-07-06 1 1
2017-07-10 1 2
2017-07-13 0 2
2017-07-14 1 3
... ... ...
2020-07-29 2020-07-15 0 4581
2020-07-17 0 4581
2020-07-20 0 4581
2020-07-21 -1 4580

[399750行x 2列]

这当然是太多行,我正在使用的数据集大约有2500行。 / p>

我尝试将其拆箱,但这会自行产生问题。

解决方案

给定 df

 姓名存款取款
彼得2019-03-07 2019 -03-11
安娜2019-03-08 2019-03-19
安娜2019-03-12 2019-05-22
彼得2019-03-12 2019-10-31
Simon 2019-03-14 2019-04-05

您可以融化数据框,以1表示存款并减少按-1,然后进行旋转:

  df = pd.DataFrame(\ 
{'name':{0:'Peter ',1:'Anna',2:'Anna',3:'Peter',4:'Simon'},
'存款s':{0:'2019-03-07',
1:'2019-03-08',
2:'2019-03-12',
3:'2019 -03-12',
4:'2019-03-14'},
'提款':{0:'2019-03-11',
1:'2019-03 -19',
2:'2019-05-22',
3:'2019-10-31',
4:'2019-04-05'}})

df2 = df.melt('name')\
.assign(variable = lambda x:x.variable.map({'Deposits':1,'Withdrawals':-1} ))\
#.pivot('value','name','variable')。fillna(0)\
#将数据透视表与总和一起使用,因为数据$中可能存在重复项b $ b .pivot_table('variable','value','name',aggfunc ='sum')。fillna(0)\
.rename(columns = lambda c:f'{c} netmov' )

以上将给出余额的净变化:

  name Anna netmov Peter netmov Simon netmov 
value
2019-03-07 0.0 1.0 0.0
2019-03-08 1.0 0.0 0.0
2019-03-11 0.0 -1.0 0.0
2019-03-12 1.0 1.0 0.0
2019-03-14 0.0 0.0 1.0
2019-03-19 -1.0 0.0 0.0
2019-04-05 0.0 0.0 -1.0
2019-05-22 -1.0 0.0 0.0
2019-10-31 0.0 -1.0 0.0

最后使用累积总和计算余额,并将其与先前计算的净变化连接起来:

  df2 = pd.concat([df2,df2 .cumsum()。rename(columns = lambda c:c.split()[0] +'balance')],轴= 1)\ 
.sort_index(axis = 1)

结果:

  name Anna balance Anna netmov ... Simon余额Simon netmov 
值...
2019-03-07 0.0 0.0 ... 0.0 0.0
2019-03-08 1.0 1.0 ... 0.0 0.0
2019-03-11 1.0 0.0 ... 0.0 0.0
2019-03-12 2.0 1.0 ... 0.0 0.0
2019-03-14 2.0 0.0 .. 。1.0 1.0
2019-03-19 1.0 -1.0 ... 1.0 0.0
2019-04-05 1.0 0.0 ... 0.0 -1.0
2019-05-22 0.0 -1.0。 .. 0.0 0.0
2019-10-31 0.0 0.0 ... 0.0 0.0

[9行x 6列]


I'm trying to dynamically build a format in which I want to display number of deposits compared to withdrawals in a timeline chart. Whenever a deposit is done, the graph will go up, and when a withdrawal is done the graph goes down.

This is how far I've gotten:

df.head()

name    Deposits    Withdrawals

Peter   2019-03-07  2019-03-11
Peter   2019-03-08  2019-03-19
Peter   2019-03-12  2019-05-22
Peter   2019-03-12  2019-10-31
Peter   2019-03-14  2019-04-05

Here is the data manipulation to show the net movements for one person; Peter.

x = pd.Series(df.groupby('Deposits').size())
y = pd.Series(df.groupby('Withdrawals').size())
balance = pd.DataFrame({'net_mov': x.sub(y, fill_value=0)})
balance = balance.assign(Peter=balance.net_mov.cumsum())

print(balance)

            net_mov  Peter
2019-03-07        1      1
2019-03-08        1      2
2019-03-11       -1      1
2019-03-12        2      3
2019-03-14        1      4

This works perfectly fine, and this is the format that I want to have. Now let's say I want to extend on this and not just list Peters deposits and withdrawals, but I want to add n-number of people. Lets assume that my dataframe looks like this:


df2.head()

name    Deposits    Withdrawals

Peter   2019-03-07  2019-03-11
Anna    2019-03-08  2019-03-19
Anna    2019-03-12  2019-05-22
Peter   2019-03-12  2019-10-31
Simon   2019-03-14  2019-04-05

The format I'm aiming for is this. I don't know how to group everything, and I don't know which names or how many columns there will be beforehand, so I can't hardcode names or number of columns. It has to be generate dynamically.

            net_mov1  Peter   net_mov2   Anna    net_mov3  Simon   
2019-03-07        1      1           1      1           2      2
2019-03-08        1      2           2      3          -1      1
2019-03-11       -1      1           0      3           2      3
2019-03-12        2      3          -2      1           4      7
2019-03-14        1      4           3      4          -1      6

UPDATE:

First off, thanks for the help. I'm getting closer to my goal. This is the progress:

x = pd.Series(df.groupby(['Created', 'name']).size())
y = pd.Series(df.groupby(['Finished', 'name']).size())
balance = pd.DataFrame({'net_mov': x.sub(y, fill_value=0)})
balance = balance.assign(balance=balance.groupby('name').net_mov.cumsum())

balance_byname = balance.groupby('name')
balance_byname.get_group("Peter")

Output:

                                                       net_mov  balance
name                       Created    Finished                    
Peter                      2017-07-03 2017-07-06        1        1
                                      2017-07-10        1        2
                                      2017-07-13        0        2
                                      2017-07-14        1        3
...                                                   ...      ...
                           2020-07-29 2020-07-15        0     4581
                                      2020-07-17        0     4581
                                      2020-07-20        0     4581
                                      2020-07-21       -1     4580

[399750 rows x 2 columns]

This is of course too many rows, the dataset I'm working with has around 2500 rows.

I've tried to unstack it but that creates problems on it's own.

解决方案

Given df:

name    Deposits    Withdrawals
Peter   2019-03-07  2019-03-11
Anna    2019-03-08  2019-03-19
Anna    2019-03-12  2019-05-22
Peter   2019-03-12  2019-10-31
Simon   2019-03-14  2019-04-05

You can melt dataframe, indicate deposits by 1 and withdravals by -1, and then pivot:

df = pd.DataFrame(\
{'name': {0: 'Peter', 1: 'Anna', 2: 'Anna', 3: 'Peter', 4: 'Simon'},
 'Deposits': {0: '2019-03-07',
  1: '2019-03-08',
  2: '2019-03-12',
  3: '2019-03-12',
  4: '2019-03-14'},
 'Withdrawals': {0: '2019-03-11',
  1: '2019-03-19',
  2: '2019-05-22',
  3: '2019-10-31',
  4: '2019-04-05'}})

df2 = df.melt('name')\
        .assign(variable = lambda x: x.variable.map({'Deposits':1,'Withdrawals':-1}))\
        #.pivot('value','name','variable').fillna(0)\ 
        #use pivot_table with sum aggregate, because there may be duplicates in data
        .pivot_table('variable','value','name', aggfunc = 'sum').fillna(0)\
        .rename(columns = lambda c: f'{c} netmov' )

Above will give net change of balance:

name        Anna netmov  Peter netmov  Simon netmov
value                                              
2019-03-07          0.0           1.0           0.0
2019-03-08          1.0           0.0           0.0
2019-03-11          0.0          -1.0           0.0
2019-03-12          1.0           1.0           0.0
2019-03-14          0.0           0.0           1.0
2019-03-19         -1.0           0.0           0.0
2019-04-05          0.0           0.0          -1.0
2019-05-22         -1.0           0.0           0.0
2019-10-31          0.0          -1.0           0.0

Finally calculate balance using cumulative sum and concatenate it with previously calculated net changes:

df2 = pd.concat([df2,df2.cumsum().rename(columns = lambda c: c.split()[0] + ' balance')], axis = 1)\
        .sort_index(axis=1)

result:

name        Anna balance  Anna netmov  ...  Simon balance  Simon netmov
value                                  ...                             
2019-03-07           0.0          0.0  ...            0.0           0.0
2019-03-08           1.0          1.0  ...            0.0           0.0
2019-03-11           1.0          0.0  ...            0.0           0.0
2019-03-12           2.0          1.0  ...            0.0           0.0
2019-03-14           2.0          0.0  ...            1.0           1.0
2019-03-19           1.0         -1.0  ...            1.0           0.0
2019-04-05           1.0          0.0  ...            0.0          -1.0
2019-05-22           0.0         -1.0  ...            0.0           0.0
2019-10-31           0.0          0.0  ...            0.0           0.0

[9 rows x 6 columns]

这篇关于一定数量的时间段内的 pandas 存取款的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆