“索引匹配"在R Studio中(多列,跨行) [英] "Index Match" In R Studio (multiple columns, across rows)

查看:123
本文介绍了“索引匹配"在R Studio中(多列,跨行)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理相当大的数据集(10万行),并想在R Studio中复制Excel Index Match函数.

I'm working with a fairly large data set (100k rows) and want to replicate the Excel Index Match function in R Studio.

我正在寻找一种创建新列的方法,如果一年中三个不同列中的3个值与另一年中三个列中的三个值相匹配,则可以从现有列中提取一个值("1995_Number")-独立于行,并创建一个新列("1994_Number").

I'm looking for a way to create a new column that will pull a value from an existing column ("1995_Number"), if 3 values from three different columns from one year match three values from three columns from another year - independent of the rows, and create a new column ("1994_Number").

以数据框为例:

dat<-data.frame(`1994_Address` = c("1234 Road","123 Road","321 Road"),`1994_ZipCode` = c(99999,99999,11111),"1994_Bank Name" = c("JPM","JPM","WF"),`1995_Address` = c("123 Road","1234 Road","321 Road"),`1995_ZipCode` = c(99999,99999,11111),"1995_Bank Name" = c("JPM","JPM","WF"),1995_Number = c(1、2、3),check.names = F,stringsAsFactors = F)

新创建的列1994_Number应该显示(2,1,3)

The newly created column 1994_Number should say (2, 1, 3)

推荐答案

可能的解决方案包括 base 中的 match 函数.与 dplyr 一起使用,可以完成以下工作:

A possible solution would include the match function from base. Toghether with dplyr the following works:

library(dplyr)
dat <- data.frame(`1994_Adress` = c("1234 Road", "123 Road", "321 Road"),
                  `1994_ZipCode` = c(99999, 99999, 11111),
                  `1994_Bank Name` = c("JPM", "JPM", "WF"),
                  `1995_Adress` = c("123 Road", "1234 Road", "321 Road"),
                  `1995_ZipCode` = c(99999, 99999, 11111),
                  `1995_Bank Name` = c("JPM", "JPM", "WF"),
                  `1995_Number` = c(1, 2, 3), check.names = F, stringsAsFactors = F)
dat %>%
  mutate(`1994_Number` = ifelse(`1994_Adress` %in% `1995_Adress` & 
                                  `1994_ZipCode` %in% `1995_ZipCode` &
                                  `1994_Bank Name` %in% `1995_Bank Name`, 
                                dat[match(dat$`1994_Adress`, dat$`1995_Adress`), "1995_Number"], NA))

#    1994_Adress 1994_ZipCode 1994_Bank Name 1995_Adress 1995_ZipCode 1995_Bank Name 1995_Number 1994_Number
# 1   1234 Road        99999            JPM    123 Road        99999            JPM           1           2
# 2    123 Road        99999            JPM   1234 Road        99999            JPM           2           1
# 3    321 Road        11111             WF    321 Road        11111             WF           3           3

这篇关于“索引匹配"在R Studio中(多列,跨行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆