根据 pandas 中的行匹配,有条件地用来自另一个DataFrame的值填充列 [英] Conditionally fill column with value from another DataFrame based on row match in Pandas
问题描述
我发现自己无法解决这个问题(自动完成税务文书工作).我有两个数据框:例如,一个具有欧元/美元汇率的季度历史记录,另一个具有我自己的发票,例如:
I find myself lost trying to solve this problem (automating tax paperwork). I have two dataframes: one with the quarterly historical records of EUR/USD exchange rates, and another with my own invoices, as an example:
import pandas as pd
import numpy as np
usdeur = [(pd.Timestamp('20170705'),1.1329),
(pd.Timestamp('20170706'),1.1385),
(pd.Timestamp('20170707'),1.1412),
(pd.Timestamp('20170710'),1.1387),
(pd.Timestamp('20170711'),1.1405),
(pd.Timestamp('20170712'),1.1449)]
labels = ['Date', 'Rate']
rates = pd.DataFrame.from_records(usdeur, columns=labels)
transactions = [(pd.Timestamp('20170706'), 'PayPal', 'USD', 100, 1),
(pd.Timestamp('20170706'), 'Fastspring', 'USD', 200, 1),
(pd.Timestamp('20170709'), 'Fastspring', 'USD', 100, 1),
(pd.Timestamp('20170710'), 'EU', 'EUR', 100, 1),
(pd.Timestamp('20170710'), 'PayPal', 'USD', 200, 1)]
labels = ['Date', 'From', 'Currency', 'Amount', 'Rate']
sales =pd.DataFrame.from_records(transactions, columns=labels)
导致:
我需要在sales['Rate']
栏中填充来自rates['Rate']
的正确汇率,即:
I would need to have the sales['Rate']
column filled with the proper exchange rates from the rates['Rate']
, that is to say:
- 如果
sales['Currency']
是'EUR'
,请不要理会. li> - 对于
sales
的每一行,在rates
中找到具有匹配的'Date'
的行;抓取非常rates['Rate']
的值并将其放入sales['Rate']
- 奖励:如果没有匹配的
'Date'
(例如,在假期期间,交易所市场关闭),请检查上一行直到找到合适的值.
- if
sales['Currency']
is'EUR'
, leave it alone. - for each row of
sales
, find the row inrates
with matching'Date'
; grab that veryrates['Rate']
value and put it insales['Rate']
- bonus: if there's no matching
'Date'
(e.g. during holidays, the exchange market is closed), check the previous row until a suitable value is found.
完整结果应如下所示(请注意,第2行的费率自2017年7月7日开始):
The full result should look like the following (note that row #2 has the rate from 2017-07-07):
我尝试遵循其他问题的一些建议解决方案,但没有运气. 提前非常感谢您
I've tried to follow several suggested solutions from other questions, but with no luck. Thank you very much in advance
推荐答案
我使用pd.merge_asof
sales=pd.merge_asof(sales,rates,on='Date',direction='backward',allow_exact_matches =True)
sales.loc[sales.From=='EU','Rate_y']=sales.Rate_x
sales
Out[748]:
Date From Currency Amount Rate_x Rate_y
0 2017-07-06 PayPal USD 100 1 1.1385
1 2017-07-06 Fastspring USD 200 1 1.1385
2 2017-07-09 Fastspring USD 100 1 1.1412
3 2017-07-10 EU EUR 100 1 1.0000
4 2017-07-10 PayPal USD 200 1 1.1387
然后
sales.drop('Rate_x',1).rename(columns={'Rate_y':'Rate'})
Out[749]:
Date From Currency Amount Rate
0 2017-07-06 PayPal USD 100 1.1385
1 2017-07-06 Fastspring USD 200 1.1385
2 2017-07-09 Fastspring USD 100 1.1412
3 2017-07-10 EU EUR 100 1.0000
4 2017-07-10 PayPal USD 200 1.1387
这篇关于根据 pandas 中的行匹配,有条件地用来自另一个DataFrame的值填充列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!