一定数量的时间段内的 pandas 存取款 [英] Pandas deposits and withdrawals over a time period with n-number of people
问题描述
我正在尝试动态建立一种格式,在该格式中,我要显示时间轴图表中的存款数量与提款数量的比较。每当存入资金时,图表就会上升,而提款完成后图表就会下降。
这就是我走的距离:
df.head()
名称存款取款
Peter 2019-03-07 2019-03-11
彼得2019-03-08 2019-03-19
彼得2019-03-12 2019-05-22
彼得2019-03-12 2019-10-31
彼得2019- 03-14 2019-04-05
这是显示一个人的净运动的数据操纵;彼得。
x = pd.Series(df.groupby('Deposits')。size())
y = pd.Series( df.groupby('Withdrawals')。size())
余额= pd.DataFrame({'net_mov':x.sub(y,fill_value = 0)})
余额= balance.assign( Peter = balance.net_mov.cumsum())
打印(余额)
net_mov彼得
2019-03-07 1 1
2019-03 -08 1 2
2019-03-11 -1 1
2019-03-12 2 3
2019-03-14 1 4
这很好用,这是我想要的格式。现在,我想进一步说明这一点,不仅要列出Peters的存款和取款,还想增加n个人数。假设我的数据框看起来像这样:
df2.head()
名称存款取款
彼得2019-03-07 2019-03-11
安娜2019-03-08 2019-03-19
安娜2019-03-12 2019-05-22
彼得2019-03-12 2019-10-31
西蒙2019-03-14 2019-04-05
我想要的格式是这个。我不知道如何对所有内容进行分组,也不知道会预先命名哪些名称或多少列,因此我无法对名称或列数进行硬编码。它必须动态生成。
net_mov1彼得net_mov2安娜net_mov3西蒙
2019-03-07 1 1 1 1 2 2
2019-03-08 1 2 2 3 -1 1
2019-03-11 -1 1 0 3 2 3
2019-03-12 2 3 -2 1 4 7
2019-03-14 1 4 3 4 -1 6
更新:
首先,感谢您的帮助。我离目标越来越近。这是进度:
x = pd.Series(df.groupby(['Created','name'])。size())
y = pd.Series(df.groupby(['Finished','name'])。size())
余额= pd.DataFrame({'net_mov':x.sub(y,fill_value = 0)})
balance = balance.assign(balance = balance.groupby('name')。net_mov.cumsum())
balance_byname = balance.groupby('name')
balance_byname.get_group( Peter)
输出:
net_mov余额
名称创建完成
Peter 2017-07-03 2017-07-06 1 1
2017-07-10 1 2
2017-07-13 0 2
2017-07-14 1 3
... ... ...
2020-07-29 2020-07-15 0 4581
2020-07-17 0 4581
2020-07-20 0 4581
2020-07-21 -1 4580
[399750行x 2列]
这当然是太多行,我正在使用的数据集大约有2500行。 / p>
我尝试将其拆箱,但这会自行产生问题。
给定 df
:
姓名存款取款
彼得2019-03-07 2019 -03-11
安娜2019-03-08 2019-03-19
安娜2019-03-12 2019-05-22
彼得2019-03-12 2019-10-31
Simon 2019-03-14 2019-04-05
您可以融化数据框,以1表示存款并减少按-1,然后进行旋转:
df = pd.DataFrame(\
{'name':{0:'Peter ',1:'Anna',2:'Anna',3:'Peter',4:'Simon'},
'存款s':{0:'2019-03-07',
1:'2019-03-08',
2:'2019-03-12',
3:'2019 -03-12',
4:'2019-03-14'},
'提款':{0:'2019-03-11',
1:'2019-03 -19',
2:'2019-05-22',
3:'2019-10-31',
4:'2019-04-05'}})
df2 = df.melt('name')\
.assign(variable = lambda x:x.variable.map({'Deposits':1,'Withdrawals':-1} ))\
#.pivot('value','name','variable')。fillna(0)\
#将数据透视表与总和一起使用,因为数据$中可能存在重复项b $ b .pivot_table('variable','value','name',aggfunc ='sum')。fillna(0)\
.rename(columns = lambda c:f'{c} netmov' )
以上将给出余额的净变化:
name Anna netmov Peter netmov Simon netmov
value
2019-03-07 0.0 1.0 0.0
2019-03-08 1.0 0.0 0.0
2019-03-11 0.0 -1.0 0.0
2019-03-12 1.0 1.0 0.0
2019-03-14 0.0 0.0 1.0
2019-03-19 -1.0 0.0 0.0
2019-04-05 0.0 0.0 -1.0
2019-05-22 -1.0 0.0 0.0
2019-10-31 0.0 -1.0 0.0
最后使用累积总和计算余额,并将其与先前计算的净变化连接起来:
df2 = pd.concat([df2,df2 .cumsum()。rename(columns = lambda c:c.split()[0] +'balance')],轴= 1)\
.sort_index(axis = 1)
结果:
name Anna balance Anna netmov ... Simon余额Simon netmov
值...
2019-03-07 0.0 0.0 ... 0.0 0.0
2019-03-08 1.0 1.0 ... 0.0 0.0
2019-03-11 1.0 0.0 ... 0.0 0.0
2019-03-12 2.0 1.0 ... 0.0 0.0
2019-03-14 2.0 0.0 .. 。1.0 1.0
2019-03-19 1.0 -1.0 ... 1.0 0.0
2019-04-05 1.0 0.0 ... 0.0 -1.0
2019-05-22 0.0 -1.0。 .. 0.0 0.0
2019-10-31 0.0 0.0 ... 0.0 0.0
[9行x 6列]
I'm trying to dynamically build a format in which I want to display number of deposits compared to withdrawals in a timeline chart. Whenever a deposit is done, the graph will go up, and when a withdrawal is done the graph goes down.
This is how far I've gotten:
df.head()
name Deposits Withdrawals
Peter 2019-03-07 2019-03-11
Peter 2019-03-08 2019-03-19
Peter 2019-03-12 2019-05-22
Peter 2019-03-12 2019-10-31
Peter 2019-03-14 2019-04-05
Here is the data manipulation to show the net movements for one person; Peter.
x = pd.Series(df.groupby('Deposits').size())
y = pd.Series(df.groupby('Withdrawals').size())
balance = pd.DataFrame({'net_mov': x.sub(y, fill_value=0)})
balance = balance.assign(Peter=balance.net_mov.cumsum())
print(balance)
net_mov Peter
2019-03-07 1 1
2019-03-08 1 2
2019-03-11 -1 1
2019-03-12 2 3
2019-03-14 1 4
This works perfectly fine, and this is the format that I want to have. Now let's say I want to extend on this and not just list Peters deposits and withdrawals, but I want to add n-number of people. Lets assume that my dataframe looks like this:
df2.head()
name Deposits Withdrawals
Peter 2019-03-07 2019-03-11
Anna 2019-03-08 2019-03-19
Anna 2019-03-12 2019-05-22
Peter 2019-03-12 2019-10-31
Simon 2019-03-14 2019-04-05
The format I'm aiming for is this. I don't know how to group everything, and I don't know which names or how many columns there will be beforehand, so I can't hardcode names or number of columns. It has to be generate dynamically.
net_mov1 Peter net_mov2 Anna net_mov3 Simon
2019-03-07 1 1 1 1 2 2
2019-03-08 1 2 2 3 -1 1
2019-03-11 -1 1 0 3 2 3
2019-03-12 2 3 -2 1 4 7
2019-03-14 1 4 3 4 -1 6
UPDATE:
First off, thanks for the help. I'm getting closer to my goal. This is the progress:
x = pd.Series(df.groupby(['Created', 'name']).size())
y = pd.Series(df.groupby(['Finished', 'name']).size())
balance = pd.DataFrame({'net_mov': x.sub(y, fill_value=0)})
balance = balance.assign(balance=balance.groupby('name').net_mov.cumsum())
balance_byname = balance.groupby('name')
balance_byname.get_group("Peter")
Output:
net_mov balance
name Created Finished
Peter 2017-07-03 2017-07-06 1 1
2017-07-10 1 2
2017-07-13 0 2
2017-07-14 1 3
... ... ...
2020-07-29 2020-07-15 0 4581
2020-07-17 0 4581
2020-07-20 0 4581
2020-07-21 -1 4580
[399750 rows x 2 columns]
This is of course too many rows, the dataset I'm working with has around 2500 rows.
I've tried to unstack it but that creates problems on it's own.
Given df
:
name Deposits Withdrawals
Peter 2019-03-07 2019-03-11
Anna 2019-03-08 2019-03-19
Anna 2019-03-12 2019-05-22
Peter 2019-03-12 2019-10-31
Simon 2019-03-14 2019-04-05
You can melt dataframe, indicate deposits by 1 and withdravals by -1, and then pivot:
df = pd.DataFrame(\
{'name': {0: 'Peter', 1: 'Anna', 2: 'Anna', 3: 'Peter', 4: 'Simon'},
'Deposits': {0: '2019-03-07',
1: '2019-03-08',
2: '2019-03-12',
3: '2019-03-12',
4: '2019-03-14'},
'Withdrawals': {0: '2019-03-11',
1: '2019-03-19',
2: '2019-05-22',
3: '2019-10-31',
4: '2019-04-05'}})
df2 = df.melt('name')\
.assign(variable = lambda x: x.variable.map({'Deposits':1,'Withdrawals':-1}))\
#.pivot('value','name','variable').fillna(0)\
#use pivot_table with sum aggregate, because there may be duplicates in data
.pivot_table('variable','value','name', aggfunc = 'sum').fillna(0)\
.rename(columns = lambda c: f'{c} netmov' )
Above will give net change of balance:
name Anna netmov Peter netmov Simon netmov
value
2019-03-07 0.0 1.0 0.0
2019-03-08 1.0 0.0 0.0
2019-03-11 0.0 -1.0 0.0
2019-03-12 1.0 1.0 0.0
2019-03-14 0.0 0.0 1.0
2019-03-19 -1.0 0.0 0.0
2019-04-05 0.0 0.0 -1.0
2019-05-22 -1.0 0.0 0.0
2019-10-31 0.0 -1.0 0.0
Finally calculate balance using cumulative sum and concatenate it with previously calculated net changes:
df2 = pd.concat([df2,df2.cumsum().rename(columns = lambda c: c.split()[0] + ' balance')], axis = 1)\
.sort_index(axis=1)
result:
name Anna balance Anna netmov ... Simon balance Simon netmov
value ...
2019-03-07 0.0 0.0 ... 0.0 0.0
2019-03-08 1.0 1.0 ... 0.0 0.0
2019-03-11 1.0 0.0 ... 0.0 0.0
2019-03-12 2.0 1.0 ... 0.0 0.0
2019-03-14 2.0 0.0 ... 1.0 1.0
2019-03-19 1.0 -1.0 ... 1.0 0.0
2019-04-05 1.0 0.0 ... 0.0 -1.0
2019-05-22 0.0 -1.0 ... 0.0 0.0
2019-10-31 0.0 0.0 ... 0.0 0.0
[9 rows x 6 columns]
这篇关于一定数量的时间段内的 pandas 存取款的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!