R:如何让grep返回匹配,而不是整个字符串 [英] R:how to get grep to return the match, rather than the whole string
问题描述
我在R问题中可能有一个非常愚蠢的 grep
。道歉,因为这似乎应该是如此简单 - 我显然只是错过了一些东西。
我有一个字符串向量,我们称之为 alice
。下面列出一些 alice
:
T.8EFF.SP .OT1.D5.VSVOVA#4
T.8EFF.SP.OT1.D6.LISOVA#1
T.8EFF.SP.OT1.D6.LISOVA#2
T.8EFF。 SP.OT1.D6.LISOVA#3
T.8EFF.SP.OT1.D6.VSVOVA#4
T.8EFF.SP.OT1.D8.VSVOVA#3
T.8EFF .SP.OT1.D8.VSVOVA#4
T.8MEM.SP#1
T.8MEM.SP#3
T.8MEM.SP.OT1.D106.VSVOVA#2
T.8MEM.SP.OT1.D45.LISOVA#1
T.8MEM.SP.OT1.D45.LISOVA#3
我希望grep能够给出在这些字符串中出现的D之后的数字,条件是包含LIS的字符串和空字符串或其他。
我希望grep能让我知道一个捕获组的价值,而不是整个字符串。这里是我的R风格的正则表达式:
pattern < - (?<= \\.D)([0 -9] +)(?=。LIS)
没什么太复杂的。但为了得到我所追求的,而不是仅仅使用 grep(pattern,alice,value = TRUE,perl = TRUE)
我正在执行以下操作,其中看起来很糟糕:
reg.out < - regexpr(
(?<= \\。 D)[0-9] +(?=。LIS),
alice,
perl = TRUE
)
substr(alice,reg.out,reg.out + attr(reg.out,match.length) - 1)
现在看它,它不会这看起来不太难看,但搞这件事情的琐碎事情令人尴尬。任何有关如何正确使用这些信息的指针?
指向我的网页可以解释我使用 $
, @
和 attr
。
pat < -
的子集,试试这个:$ c* \\.D([0-9] +)\\.LIS。*
sub(pat,\\1,alice)
$ c如果你只希望你的模式匹配的alice
$ b
pat < - 。* \\.D([0-9] +)\\。 LIS。*
sub(pat,\\ 1,alice [grepl(pat,alice)])
I have what is probably a really dumb
grep
in R question. Apologies, because this seems like it should be so easy - I'm obviously just missing something.I have a vector of strings, let's call it
alice
. Some ofalice
is printed out below:T.8EFF.SP.OT1.D5.VSVOVA#4 T.8EFF.SP.OT1.D6.LISOVA#1 T.8EFF.SP.OT1.D6.LISOVA#2 T.8EFF.SP.OT1.D6.LISOVA#3 T.8EFF.SP.OT1.D6.VSVOVA#4 T.8EFF.SP.OT1.D8.VSVOVA#3 T.8EFF.SP.OT1.D8.VSVOVA#4 T.8MEM.SP#1 T.8MEM.SP#3 T.8MEM.SP.OT1.D106.VSVOVA#2 T.8MEM.SP.OT1.D45.LISOVA#1 T.8MEM.SP.OT1.D45.LISOVA#3
I'd like grep to give me the number after the D that appears in some of these strings, conditional on the string containing "LIS" and an empty string or something otherwise.
I was hoping that grep would return me the value of a capturing group rather than the whole string. Here's my R-flavoured regexp:
pattern <- (?<=\\.D)([0-9]+)(?=.LIS)
nothing too complicated. But in order to get what I'm after, rather than just using
grep(pattern, alice, value = TRUE, perl = TRUE)
I'm doing the following, which seems bad:reg.out <- regexpr( "(?<=\\.D)[0-9]+(?=.LIS)", alice, perl=TRUE ) substr(alice,reg.out,reg.out + attr(reg.out,"match.length")-1)
Looking at it now it doesn't seem too ugly, but the amount of messing about it's taken to get this utterly trivial thing working has been embarrassing. Anyone any pointers about how to go about this properly?
Bonus marks for pointing me to a webpage that explains the difference between whatever I access with
$
,@
andattr
.解决方案You can do something like this:
pat <- ".*\\.D([0-9]+)\\.LIS.*" sub(pat, "\\1", alice)
If you only want the subset of
alice
where your pattern matches, try this:pat <- ".*\\.D([0-9]+)\\.LIS.*" sub(pat, "\\1", alice[grepl(pat, alice)])
这篇关于R:如何让grep返回匹配,而不是整个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!