根据 pandas 中的行匹配,有条件地用来自另一个DataFrame的值填充列 [英] Conditionally fill column with value from another DataFrame based on row match in Pandas

查看:371
本文介绍了根据 pandas 中的行匹配,有条件地用来自另一个DataFrame的值填充列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现自己无法解决这个问题(自动完成税务文书工作).我有两个数据框:例如,一个具有欧元/美元汇率的季度历史记录,另一个具有我自己的发票,例如:

I find myself lost trying to solve this problem (automating tax paperwork). I have two dataframes: one with the quarterly historical records of EUR/USD exchange rates, and another with my own invoices, as an example:

import pandas as pd
import numpy as np

usdeur = [(pd.Timestamp('20170705'),1.1329),
          (pd.Timestamp('20170706'),1.1385),
          (pd.Timestamp('20170707'),1.1412),
          (pd.Timestamp('20170710'),1.1387),
          (pd.Timestamp('20170711'),1.1405),
          (pd.Timestamp('20170712'),1.1449)]
labels = ['Date', 'Rate']
rates = pd.DataFrame.from_records(usdeur, columns=labels)

transactions = [(pd.Timestamp('20170706'), 'PayPal',     'USD', 100, 1),
                (pd.Timestamp('20170706'), 'Fastspring', 'USD', 200, 1),
                (pd.Timestamp('20170709'), 'Fastspring', 'USD', 100, 1),
                (pd.Timestamp('20170710'), 'EU',         'EUR', 100, 1),
                (pd.Timestamp('20170710'), 'PayPal',     'USD', 200, 1)]
labels = ['Date', 'From', 'Currency', 'Amount', 'Rate']
sales =pd.DataFrame.from_records(transactions, columns=labels)

导致:

我需要在sales['Rate']栏中填充来自rates['Rate']的正确汇率,即:

I would need to have the sales['Rate'] column filled with the proper exchange rates from the rates['Rate'], that is to say:

  • 如果sales['Currency']'EUR',请不要理会.
  • 对于sales的每一行,在rates中找到具有匹配的'Date'的行;抓取非常rates['Rate']的值并将其放入sales['Rate']
  • 奖励:如果没有匹配的'Date'(例如,在假期期间,交易所市场关闭),请检查上一行直到找到合适的值.
  • if sales['Currency'] is 'EUR', leave it alone.
  • for each row of sales, find the row in rates with matching 'Date'; grab that very rates['Rate'] value and put it in sales['Rate']
  • bonus: if there's no matching 'Date' (e.g. during holidays, the exchange market is closed), check the previous row until a suitable value is found.

完整结果应如下所示(请注意,第2行的费率自2017年7月7日开始):

The full result should look like the following (note that row #2 has the rate from 2017-07-07):

我尝试遵循其他问题的一些建议解决方案,但没有运气. 提前非常感谢您

I've tried to follow several suggested solutions from other questions, but with no luck. Thank you very much in advance

推荐答案

我使用pd.merge_asof

sales=pd.merge_asof(sales,rates,on='Date',direction='backward',allow_exact_matches =True)
sales.loc[sales.From=='EU','Rate_y']=sales.Rate_x

sales
Out[748]: 
        Date        From Currency  Amount  Rate_x  Rate_y
0 2017-07-06      PayPal      USD     100       1  1.1385
1 2017-07-06  Fastspring      USD     200       1  1.1385
2 2017-07-09  Fastspring      USD     100       1  1.1412
3 2017-07-10          EU      EUR     100       1  1.0000
4 2017-07-10      PayPal      USD     200       1  1.1387

然后

sales.drop('Rate_x',1).rename(columns={'Rate_y':'Rate'})
Out[749]: 
        Date        From Currency  Amount    Rate
0 2017-07-06      PayPal      USD     100  1.1385
1 2017-07-06  Fastspring      USD     200  1.1385
2 2017-07-09  Fastspring      USD     100  1.1412
3 2017-07-10          EU      EUR     100  1.0000
4 2017-07-10      PayPal      USD     200  1.1387

这篇关于根据 pandas 中的行匹配,有条件地用来自另一个DataFrame的值填充列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆