使用平均成本基础法计算成本 [英] Calculate costs with Average cost basis method
问题描述
我有这个 Pandas 数据框(我按Ticker"升序排序,然后按Date"和Type"排序),它表示与Ticker"列中的股票相关的交易:
I have this pandas dataframe (that I ordered in ascending order by 'Ticker' and then by 'Date' and 'Type') that rapresents trades relative to stocks in 'Ticker' column:
Type Ticker Qty Price Amount TotalQty
Date
2020-03-01 Buy AAA 40 1.0 40.0 40
2020-03-04 Buy AAA 50 1.5 75.0 90
2020-03-08 Sell AAA 60 1.2 72.0 30
2020-03-10 Buy AAA 20 2.0 40.0 50
2020-03-15 Sell AAA 50 2.0 100.0 0
2020-03-18 Buy AAA 50 1.5 75.0 50
2020-03-01 Buy BBB 50 2.0 100.0 50
2020-03-04 Buy BBB 50 1.5 75.0 100
2020-03-09 Sell BBB 50 1.0 50.0 50
2020-03-15 Buy BBB 50 3.0 150.0 100
2020-03-01 Buy CCC 10 3.0 30.0 10
2020-03-10 Buy CCC 10 2.0 20.0 20
2020-03-11 Sell CCC 20 2.2 44.0 0
2020-03-12 Buy DDD 90 2.0 180.0 90
在 'TotalQty' 列中,我输入了 'Qty' 列的累计总和(当 'Type' 列等于 Sell 时,考虑到 'Qty' 的负号).
In the 'TotalQty' column I put the cumulative sum of 'Qty' column (considering a negative sign for 'Qty' when the 'Type' column is equal to Sell).
现在我想为使用平均成本基础方法计算的平均成本添加另一列.
Now I want to add another column for the average costs calculated with average cost basis method.
例如,考虑到股票代码AAA",在第一次购买之后,显然我的平均成本为 1.0.然后在第二个购买"之后,我的平均成本为 (40.0 + 75.0)/90,即金额"(40.0 + 75.0) 除以TotalQty"(90) 的累积总和,即 1.278.然后,在第三行,我有 60 只股票的卖出",因此TotalQty"从 90 变为 30,但平均成本不会改变.最后在第四行,我有一个 20 件的购买",新的平均成本可以通过除以当前数量"和先前平均成本之和乘以总数量"计算得出上一个TotalQty",即 (40 + 1.278*30)/50 或 1.567.
For example considering the ticker 'AAA', after the first Buy, obviously I have an average cost of 1.0. Then after the second 'Buy' I have an average cost of (40.0 + 75.0)/90 that is the cumulative sum of 'Amount' (40.0 + 75.0) divided by 'TotalQty' (90) i.e. 1.278. Then, at the third row, I have a 'Sell' of 60 stocks so the 'TotalQty' from 90 becames 30, but the average cost dosen't change. Finally at fourth row I have a 'Buy' of 20 pieces, and the new average costs can be calculated dividing by the 'TotalQty' the sum of the current 'Amount' and previous average cost multiplied by the previous 'TotalQty' i.e. (40 + 1.278*30)/50 or 1.567.
所需的输出应该是:
Type Ticker Qty Price Amount TotalQty AverageCost
Date
2020-03-01 Buy AAA 40 1.0 40.0 40 1.000
2020-03-04 Buy AAA 50 1.5 75.0 90 1.278
2020-03-08 Sell AAA 60 1.2 72.0 30 1.278
2020-03-10 Buy AAA 20 2.0 40.0 50 1.567
2020-03-15 Sell AAA 50 2.0 100.0 0 1.567
2020-03-18 Buy AAA 50 1.5 75.0 50 1.500
2020-03-01 Buy BBB 50 2.0 100.0 50 2.000
2020-03-04 Buy BBB 50 1.5 75.0 100 1.750
2020-03-09 Sell BBB 50 1.0 50.0 50 1.750
2020-03-15 Buy BBB 50 3.0 150.0 100 2.375
2020-03-01 Buy CCC 10 3.0 30.0 10 3.000
2020-03-10 Buy CCC 10 2.0 20.0 20 2.500
2020-03-11 Sell CCC 20 2.2 44.0 0 2.500
2020-03-12 Buy DDD 90 2.0 180.0 90 2.000
我已经尝试过使用此代码
I've tried with this code
df['AverageCost']=df.apply(lambda x: x.Amount if x.Type=='Buy' else np.NaN,axis=1)
df['AverageCost']=df.groupby('Ticker')['AverageCost'].cumsum().div(df['TotalQty'])
df.fillna(method='ffill',inplace=True)
但这显然给出了错误的结果.我无法使用 df.apply()
获得正确的结果,因为在出售"(这确实不会改变平均成本)之后,我需要一个新的购买"访问以前平均成本以计算新成本.
but that clearly gives the wrong results. I wasn't able to obtain the correct results with df.apply()
since, after a 'Sell' (that indeed dosen't change the average cost), with a new 'Buy' I need to access to the previous average costs to calculate the new ones.
我应该使用 for 循环以及如何使用?
Should I use a for loop and how?
感谢您的支持!
推荐答案
- 您可以过滤
Buy
并获得'CumAmountBuy'
和'CumQtyBuy'
.然后,您可以将其合并回原始 dtaframe 和ffill()
.这允许我们使用Sell
获取不包括行的累积信息 - 创建一个系列
s
来计算Buy
行的总数,这些行不直接在同一组内的sell
之后出现.我们将有条件地使用它来计算np.select
条件计算中的一些行. - 但是还有另外两个条件: a.
Buy
行,但 DO 直接在Sell
和Sell
行之后.对于Sell
之后的Buy
行,您可以使用 `shift()~ 来执行您在注释中提到的计算,以执行前一行的部分计算. - 如果
Sell
,则返回NaN
,但我们使用.ffill()
来填充前几行的值.
- You could filter for
Buy
and get'CumAmountBuy'
and'CumQtyBuy'
. Then, you can merge it back to the original dtaframe andffill()
. This allows us to get the cumulative information excluding rows withSell
- Create a series
s
that calculates the total forBuy
rows that DO NOT come directly after asell
within the same group. We will conditionally use this to calculate some of the rows in thenp.select
conditional calculation. - However there are two other conditions: a. rows that are
Buy
but DO come directly afterSell
as well asSell
rows. ForBuy
rows that come afterSell
, you can do the calculation you mentioned in the comments using `shift()~ to do so part of the calculation on the previous row. - If
Sell
, then that will returnNaN
but we use.ffill()
to fill value previous rows.
df1 = (df.copy()[df['Type'] == 'Buy']
.assign(CumAmountBuy=df.groupby('Ticker')['Amount'].cumsum())
.assign(CumQtyBuy=df.groupby('Ticker')['Qty'].cumsum()))
df2 = pd.merge(df,df1,how='left',
on=['Date','Type', 'Ticker', 'Qty', 'Price',
'Amount', 'TotalQty']).ffill()
s = df2['CumAmountBuy'] / df2['CumQtyBuy']
df2['AverageCost'] = np.select([((df2['Type'] == 'Buy') & (df2['Type'].shift() == 'Sell')),
(df2['Type'] == 'Sell')],
[((df2['Qty'] * df2['Price'] + df2['TotalQty'].shift() * s.shift()) / df2['TotalQty']),
np.nan],
s)
df2['AverageCost'] = round(df2['AverageCost'],3).ffill()
df2 = df2.drop(['CumQtyBuy', 'CumAmountBuy'], axis=1)
df2
Out[1]:
Date Type Ticker Qty Price Amount TotalQty AverageCost
0 2020-03-01 Buy AAA 40 1.0 40.0 40 1.000
1 2020-03-04 Buy AAA 50 1.5 75.0 90 1.278
2 2020-03-08 Sell AAA 60 1.2 72.0 30 1.278
3 2020-03-10 Buy AAA 20 2.0 40.0 50 1.567
4 2020-03-15 Sell AAA 50 2.0 100.0 0 1.567
5 2020-03-18 Buy AAA 50 1.5 75.0 50 1.500
6 2020-03-01 Buy BBB 50 2.0 100.0 50 2.000
7 2020-03-04 Buy BBB 50 1.5 75.0 100 1.750
8 2020-03-09 Sell BBB 50 1.0 50.0 50 1.750
9 2020-03-15 Buy BBB 50 3.0 150.0 100 2.375
10 2020-03-01 Buy CCC 10 3.0 30.0 10 3.000
11 2020-03-10 Buy CCC 10 2.0 20.0 20 2.500
12 2020-03-11 Sell CCC 20 2.2 44.0 0 2.500
13 2020-03-12 Buy DDD 90 2.0 180.0 90 2.000
这篇关于使用平均成本基础法计算成本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!