Vlookup匹配像R中的功能 [英] Vlookup-match like function in R

查看:131
本文介绍了Vlookup匹配像R中的功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R很新,我目前正在申请,但是很少有关于RI的知识必须进行工作的分析工作。



我有两个数据帧 - 数据帧A包括交易详情,而数据帧B由各种货币的每月收盘汇率组成。



数据帧A - 交易详情

  TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE 
1 0001 INR 305000 3月2014 2014-03-01
2 0002 USD 15000 2014年10月2014-10-31
3 0003日元85000 2015年2月2015-02-09
4 0004 CNY 1800000 2015年3月2015-03-27

结构(列表(TRANSACTION_ID = c(0001,0002,0003,0004),
COLLECTION_CRNCY = c(INR,USD,JPY,CNY),COLLECTION_AMT = c(305000,
15000,85000,1800000),MMYYYY =结构(c(2014.16666 ,
LODG_DATE = structure(c(16130,16374,16475,16521),class =Date)),
row.names = c(NA,-4L),class =data.frame)

数据框B - 汇率

  MMYYYY日期CNY INR JPY USD 
2014年3月1日2014-03-31 4.9444 47.726 82.0845 0.7951654
2014年10月2日2014-10-31 4.7552 47.749 87.2604 0.7778469
2015年2月3日2015-02-27 4.5990 45.222 87.7690 0.7338372
2015年3月4日2015- 03-31 4.5179 45.383 87.5395 0.7287036

结构(列表(MMYYYY =结构(c(2014.16666666667,
201475,2015.08333333333,2015.16666666667),class =yearmon),
日期=结构(c(16160,16374,16493,16525),class =Date),CNY =
c(4.9444,4.7552,4.599,4.5179),INR = c(47.726,47.749,45.222,45.383)
JPY = c(82.0845,87.2604,87.769,87.5395),美元= c(0.795165394,0.77784692,
0.73383723 5,0.728703636)),.Names = c(MMYYYY,Date,CNY,INR,JPY,
USD),class =data.frame name = c(NA,-4L))

我想做的是创建一个新的数据框A中的列可能命名为汇率。我想通过查找数据框B来获得这个汇率值,通过匹配 COLLECTION_CRNCY MMYYYY 数据帧A到数据帧BIe:

  TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate 
1 0001 INR 305000 Mar 2014 2014-03-01 47.7260000
2 0002 USD 15000 2014年10月2014-10-31 0.7778469
3 0003 JPY 85000 2015年2月2015-02-09 87.7690000
4 0004 CNY 1800000 2015年3月2015 2015 -03-27 4.5179000

我可以轻松地通过Excel使用vlookup和match做这个,但我想要知道如何使用R实现相同的结果,因为我的交易细节文件是相当巨大的。

解决方案

data.table 方法。基本上您需要做的是将 df2 转换为长格式,然后将简单(二进制)左连接转换为 df1

  library(data.table)
temp< - melt(setDT(df2 [-2]) ,MMYYYY,variable.name =COLLECTION_CRNCY)
setkey(setDT(df1),MMYYYY,COLLECTION_CRNCY)[temp,exchange.rate:= i.value]
df1
# TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate
#1:0001 INR 305000 2014.167 2014-03-01 47.7260000
#2:0002 USD 15000 2014.750 2014-10-31 0.7778469
#3:0003 JPY 85000 2015.083 2015-02-09 87.7690000
#4:0004 CNY 1800000 2015.167 2015-03-27 4.5179000






或者,您可以使用Hadleyverse做类似的事情,但 dplyr 将无法合并c $ c>动物园 c olumns(现在),所以你需要先把它们卸载

  library(dplyr)
library(tidyr )
df2 [-2]%>%
gather(COLLECTION_CRNCY,exchange.rate,-MMYYYY)%>%
mutate(MMYYYY = as.numeric(MMYYYY))%> ;%
left_join(df1%>%mutate(MMYYYY = as.numeric(MMYYYY)),。,
by = c(MMYYYY,COLLECTION_CRNCY))
#TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate
#1 0001 INR 305000 2014.167 2014-03-01 47.7260000
#2 0002 USD 15000 2014.750 2014-10-31 0.7778469
#3 0003 JPY 85000 2015.083 2015 -02-09 87.7690000
#4 0004 CNY 1800000 2015.167 2015-03-27 4.5179000


I am very new to R, and I am currently to apply however little knowledge of R I have to an analytical work I have to perform for work.

I have two dataframes - dataframe A consists of transactions details, while dataframe B consists of the monthly closing exchange rate for various currencies.

Data frame A - transaction details

    TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE
1           0001              INR         305000 Mar 2014 2014-03-01
2           0002              USD          15000 Oct 2014 2014-10-31
3           0003              JPY          85000 Feb 2015 2015-02-09
4           0004              CNY        1800000 Mar 2015 2015-03-27

structure(list(TRANSACTION_ID = c("0001", "0002", "0003", "0004"), 
COLLECTION_CRNCY = c("INR", "USD", "JPY", "CNY"), COLLECTION_AMT = c(305000, 
15000, 85000, 1800000), MMYYYY = structure(c(2014.16666666667, 
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"),
LODG_DATE = structure(c(16130, 16374, 16475, 16521), class = "Date")), 
row.names = c(NA, -4L), class = "data.frame")

Data frame B - Exchange Rates

    MMYYYY       Date    CNY    INR     JPY       USD
1 Mar 2014 2014-03-31 4.9444 47.726 82.0845 0.7951654
2 Oct 2014 2014-10-31 4.7552 47.749 87.2604 0.7778469
3 Feb 2015 2015-02-27 4.5990 45.222 87.7690 0.7338372
4 Mar 2015 2015-03-31 4.5179 45.383 87.5395 0.7287036

structure(list(MMYYYY = structure(c(2014.16666666667, 
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"), 
Date = structure(c(16160, 16374, 16493, 16525), class = "Date"), CNY = 
c(4.9444, 4.7552, 4.599, 4.5179), INR = c(47.726, 47.749, 45.222, 45.383), 
JPY = c(82.0845, 87.2604, 87.769, 87.5395), USD = c(0.795165394, 0.77784692, 
0.733837235, 0.728703636)), .Names = c("MMYYYY", "Date", "CNY", "INR", "JPY", 
"USD"), class = "data.frame", row.names = c(NA, -4L))

What I would like to do is to create a new column in data frame A possibly named Exchange Rate. And I would like to get this exchange rate value by looking up to data frame B, by matching the COLLECTION_CRNCY and MMYYYY in data frame A to data frame B. I.e:

TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
1           0001              INR         305000 Mar 2014 2014-03-01    47.7260000
2           0002              USD          15000 Oct 2014 2014-10-31     0.7778469
3           0003              JPY          85000 Feb 2015 2015-02-09    87.7690000
4           0004              CNY        1800000 Mar 2015 2015-03-27     4.5179000

I can easily do this via Excel using vlookup and match, but I would like to know how I can go about achieving the same results using R as my transactions details file is quite huge.

解决方案

Here's a possible data.table approach. Basically what you need to do is to convert df2 to a long format and then just a simple (binary) left join to df1

library(data.table)
temp <- melt(setDT(df2[-2]), "MMYYYY", variable.name = "COLLECTION_CRNCY")
setkey(setDT(df1), MMYYYY, COLLECTION_CRNCY)[temp, exchange.rate := i.value]
df1
#    TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
# 1:           0001              INR         305000 2014.167 2014-03-01    47.7260000
# 2:           0002              USD          15000 2014.750 2014-10-31     0.7778469
# 3:           0003              JPY          85000 2015.083 2015-02-09    87.7690000
# 4:           0004              CNY        1800000 2015.167 2015-03-27     4.5179000


Alternatively, you can do a similar thing using "Hadleyverse" but dplyr won't able to merge on zoo class columns (for now), so you'll need to unclass them first

library(dplyr)
library(tidyr)
df2[-2] %>% 
  gather(COLLECTION_CRNCY, exchange.rate, -MMYYYY) %>%
  mutate(MMYYYY = as.numeric(MMYYYY)) %>%
  left_join(df1 %>% mutate(MMYYYY = as.numeric(MMYYYY)), .,
                           by = c("MMYYYY", "COLLECTION_CRNCY"))
#   TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
# 1           0001              INR         305000 2014.167 2014-03-01    47.7260000
# 2           0002              USD          15000 2014.750 2014-10-31     0.7778469
# 3           0003              JPY          85000 2015.083 2015-02-09    87.7690000
# 4           0004              CNY        1800000 2015.167 2015-03-27     4.5179000

这篇关于Vlookup匹配像R中的功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆