在 R 中查找两个向量之间的匹配字符串 [英] Find matching strings between two vectors in R

查看:112
本文介绍了在 R 中查找两个向量之间的匹配字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中有两个向量.我想在它们之间找到部分匹配.

I have two vectors in R. I want to find partial matches between them.

第一个来自名为 muc 的数据集,其中包含 6400 个街道名称.muc$name 看起来像:

The first one is from a dataset named muc, which contains 6400 street names. muc$name looks like:

muc$name = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen", "Altostraße",...)

另一个向量是 d_vector.它包含大约 1400 个名字.

The other vector is d_vector. It contains around 1400 names.

d_vector = "Abel", "Abendroth", "von Abercron", "Abetz", "Abicht", "Abromeit", ...

我想找到所有街道名称,其中包含街道名称中某处 d_vector 的名称.

I want to find all the street names, that contain a name from d_vector somewhere in the street name.

首先,我在导入 csv 数据(作为变量 d)后做了一些通用的调整:

First, I made some general adaptions after importing the csv data (as variable d):

d_vector <- unlist(d$name)d_vector <- as.vector(as.matrix(d_vector))

  • 然后我试着用grep找到一个解决方案,把d_vector变成一个包含一个长字符串,用|分隔的字符串.对于正则表达式搜索:

result <- unique(grep(paste(d_vector, collapse="|"), muc$Name, value=TRUE, ignore.case = TRUE))结果

但结果返回所有街道名称.

But the result returns all the street names.

  • 我也尝试使用 agrep,它重新调整了 内存不足-错误.

当我尝试 d_vector %in% muc$name 时,它只返回一个 TRUE 和数百个 FALSE,这似乎不正确.

When I tried d_vector %in% muc$nameit returned just one TRUE and hundreds of FALSE, which doesn't seem right.

你对我的错误可能在哪里或我可以使用哪个库有什么建议吗?我正在为 R 寻找类似 python 的fuzzywuzzy"

Do you have any suggestion where my mistake could lay or which library I could use? I am looking for something like python's "fuzzywuzzy" for R

推荐答案

简单的解决方案:

streets = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen" , "Altostraße")
streets = tolower(streets) #Lowercase all
names = c("Berber", "Weg")
names = tolower(names)

sapply(names, function (y) sapply(streets, function (x) grepl(y, x)))

#                   berber   weg
#berberichweg        TRUE  TRUE
#otto-klemperer-weg  FALSE TRUE
#feldmeierbogen      FALSE FALSE
#altostraße          FALSE FALSE

这篇关于在 R 中查找两个向量之间的匹配字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆