查找R中2个整数的最长匹配 [英] Find the longest match of 2 integers in R

查看:65
本文介绍了查找R中2个整数的最长匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个带有数字的列表,我需要将一个列表的值与另一个列表匹配.必须根据数字的开头进行匹配.它必须返回可能的最长匹配项的row_id.

I have 2 lists with numbers and I need to match the values of one list with the other. The match has to be done based on the beginning of the number. It has to return the row_id of the longest match that is possible.

lookup value: 12345678

find_list:
a   1
b   12
c   123
d   124
e   125
f   1234
g   1235

在此示例中,我们将与a,b,c,f匹配,并且R必须返回f.由于f是最长的匹配项,因此也是最好的匹配项.

In this example we would have a match with a,b,c,f and R must return f. Since f is the longest and therefore the best match.

我现在在R中使用了startsWith函数.从该答案中,我选择了最长的值.但是问题在于列表很大.我在find_list中有1850万个查找值和300,000个可能的值,一段时间后R崩溃.

I now have used the startsWith function in R. From that answer I choose the value that is the longest. But the problem is that the lists are huge. I have 18.5 Million lookup values and 300,000 possible values in the find_list and R crashes after a while.

是否有更聪明的方法来做到这一点?

Is there a smarter way to do this?

推荐答案

这是基本R中的一种方法.

Here is one method in base R.

# construct a vector of all possible matches for the lookup value
lookupVec <- floor(lookup * (10 ^ (-1 * (0:(nchar(lookup)-1)))))

这将返回

lookupVec
[1] 1234567  123456   12345    1234     123      12       1

# find the value of the first variable that matches the maximum value
# lower values in the vector

dat$V1[which.min(match(dat$V2, lookupVec))]
[1] f
Levels: a b c d e f g

您可以通过使用相同名称的包中的fastmatch函数替换基R的match函数来加快此过程,因为如果您再次搜索这些值,它将散列表值.

You can probably speed this up by replacing base R's match function with the fastmatch function from the package of the same name as it will hash the table values if you search over these a second time.

数据

dat <-
structure(list(V1 = structure(1:7, .Label = c("a", "b", "c", 
"d", "e", "f", "g"), class = "factor"), V2 = c(1L, 12L, 123L, 
124L, 125L, 1234L, 1235L)), .Names = c("V1", "V2"), class = "data.frame",
row.names = c(NA, -7L))

lookup <- 12345678

这篇关于查找R中2个整数的最长匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆