使用 str_detect (或其他一些函数)和某种方法来循环遍历列表以基本上执行 vlookup [英] Using str_detect (or some other function) and some way to loop through a list to essentially perform a vlookup

查看:47
本文介绍了使用 str_detect (或其他一些函数)和某种方法来循环遍历列表以基本上执行 vlookup的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找一种方法来做到这一点,这里的一些结果似乎很相似,似乎没有任何效果,我也找不到一种方法可以像 excel 中的 vlookup 那样循环遍历列表.如果我错过了,我深表歉意.

I have been searching for a way to do this and some results on here seem similar, nothing seems to be working, nor can I find a method that will loop through a list like a vlookup in excel. I apologize if I have missed it.

我正在尝试使用 Mutate 向数据集添加一个新列.它要做的是使用 str_replace (或其他一些函数,如果需要)查看一列,然后循环访问另一个列表.我想用另一列中的相应值替换它找到的内容.本质上是 excel 中的 vlookup.但是,它不能在 excel 中完成,因为文件太大了.

I am trying to add a new column to a data set with Mutate. What it is going to do is look at one column using str_replace (or some other function if necessary), and then loop through another list. I want to replace what it finds on with the corresponding value in another column. Essentially a vlookup in excel. It cannot be done in excel however because the file is simply too large.

我可以一次执行一个简单的 str_replace 操作,但是我需要从 502 种可能的选项中进行选择,因此为此编写代码需要很长时间.这是我目前所拥有的:

I can do a simple str_replace one at a time, but there are 502 possible options that I need to choose from, so writing the code for that would take a very long time. Here is what I have so far:

 testVendor <- vendorData %>%
  select(TOUPPER(Addr1) %>%
  mutate('NewAdd' = str_replace(Addr1, 'STREET', 'ST'))

但是,我希望它循环遍历常用邮政缩写列表并返回标准缩写,而不是我指定 STREET 然后是 ST.

However, rather than me specifying STREET and then ST, I want it to loop through a list of common postal abbreviations and return the standard abbreviation.

一个例子是

addr1 <- c('123 MAIN STREET', '123 GARDEN ROAD', '123 CHARLESTON BOULEVARD')
state_abbrv <- c('FL', 'CA', 'NY')
vendor <- data.frame(addr1, state_abbrv)
usps_name <- c('STREET', 'LANE', 'BOULEVARD', 'ROAD', 'TURNPIKE')
usps_abbrv <- c('ST', 'LN', 'BLVD', 'RD', 'TPKE')
usps <- data.frame(usps_name, usps_abbrv)

理想的输出是供应商数据框上的一个新列,如下所示:

The ideal output would be a new column on the vendor data frame and would look like this:

对此的任何帮助都很棒,如果不清楚我在寻找什么,请让我扩展这个问题.

Any assistance with this is wonderful, and please allot me to expand on the question if it is unclear of what I am looking for.

提前致谢.

推荐答案

我会使用 for 循环:

I would use a for loop:

usps[] = lapply(usps, as.character)
vendor$new_addr1 = as.character(vendor$addr1)

for(i in 1:nrow(usps)) {
  vendor$new_addr1 = str_replace_all(
    vendor$new_addr1, 
    pattern = usps$usps_name[i], 
    replacement = usps$usps_abbrv[i])
}

vendor
#                      addr1 state_abbrv           new_addr1
# 1          123 MAIN STREET          FL         123 MAIN ST
# 2          123 GARDEN ROAD          CA       123 GARDEN RD
# 3 123 CHARLESTON BOULEVARD          NY 123 CHARLESTON BLVD

为了更加安全,我会在您的模式中添加正则表达式单词边界,如下所示,以便只替换整个单词.(我假设您希望 AIRPLANE RD 更改为 AIRPLANE RD,而不是 AIRPLN RD)

To be extra safe, I'd add regex word boundaries to your patterns, as below, so that only whole words are replaced. (I assume you want AIRPLANE RD changed to AIRPLANE RD, not AIRPLN RD)

for(i in 1:nrow(usps)) {
  vendor$new_addr1 = str_replace_all(
    vendor$new_addr1, 
    pattern = paste0("\\b", usps$usps_name[i], "\\b"), 
    replacement = usps$usps_abbrv[i])
}

这篇关于使用 str_detect (或其他一些函数)和某种方法来循环遍历列表以基本上执行 vlookup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆