使用 %in% 运算符在 R 中进行部分字符串匹配? [英] Partial String Match in R using the %in% operator?

查看:56
本文介绍了使用 %in% 运算符在 R 中进行部分字符串匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很想知道是否可以使用 R 中的 %in% 运算符进行部分字符串匹配.我知道有很多方法可以使用 stringr 等来查找部分字符串匹配,但是我当前的代码使用 %in% 运算符更容易工作.

I'm curious to know if it is possible to do partial string matches using the %in% operator in R. I know that there are many ways to use stringr, etc. to find partial string matches, but my current code works easier using the %in% operator.

例如,想象这个向量:

x <- c("Withdrawn", "withdrawn", "5-Withdrawn", "2-WITHDRAWN", "withdrawnn")

我希望这些都为真,因为字符串包含撤回",但只有第一个为真:

I want each of these to be TRUE because the string contains "Withdrawn", but only the first is TRUE:

x %in% c("Withdrawn")
[1]  TRUE FALSE FALSE FALSE FALSE

我尝试使用正则表达式至少让它不区分大小写,但这让一切都是假的:

I tried using regex to at least make it case insensitive, but that made everything false:

x %in% c("(?i)Withdrawn")
[1] FALSE FALSE FALSE FALSE FALSE

那么,是否有可能使用带有包装器的 %in% 运算符在所有这些上产生 TRUE?因为 tolower() 或 toupper() 很容易使用,所以我不as关心区分大小写;然而,对我来说重要的是代码会触发撤回"、撤回"和5-撤回".

So, is it possible to yield TRUE on all of these using the %in% operator with maybe a wrapper? Because it's easy to use tolower() or toupper(), I'm not as concerned with the case sensitivity; however, it is important to me that the code would trigger "withdrawn", "withdrawnn", and "5-withdrawn".

这个问题被标记为这个问题的重复案例- 在 R 中不敏感地搜索列表;但是,它是不同的,因为它询问是否可以使用 %in% 运算符进行部分字符串匹配.链接的问题根本不使用 %in% 运算符.

This question was marked as a duplicate of this question Case-insensitive search of a list in R; however, it is different because it is asking if partial string matches are possible using the %in% operator. The linked question does not use the %in% operator at all.

推荐答案

%in% 不支持这个:它是 match 函数的包装器,它使用相等比较建立匹配,而不是正则表达式匹配.但是,您可以实现自己的:

%in% does not support this: It’s a wrapper for the match function, which uses equality comparison to establish matches, not regular expression matching. However, you can implement your own:

`%rin%` = function (pattern, list) {
     vapply(pattern, function (p) any(grepl(p, list)), logical(1L), USE.NAMES = FALSE)
}

这可以像 %in% 一样使用:

And this can be used like %in%:

〉'^foo.*' %rin% c('foo', 'foobar')
[1] TRUE

请注意,结果与您对 grepl 期望的工作要求不同:模式匹配是非对称,您不能交换左右-手边.如果您只想将列表与单个正则表达式匹配,请直接使用 grepl:

Note that the result differs from your requirement to work as you’d expect from grepl: pattern matching is asymmetric, you can’t swap the left and right-hand side. If you just want to match a list against a single regular expression, use grepl directly:

〉grepl("(?i)Withdrawn", x)
[1] TRUE TRUE TRUE TRUE TRUE

或者,如果您更喜欢使用运算符:

Or, if you prefer using an operator:

`%matches%` = grepl

〉"(?i)Withdrawn" %matches% x
[1] TRUE TRUE TRUE TRUE TRUE

这篇关于使用 %in% 运算符在 R 中进行部分字符串匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆