R:如何让grep返回匹配,而不是整个字符串 [英] R:how to get grep to return the match, rather than the whole string

查看:383
本文介绍了R:如何让grep返回匹配,而不是整个字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R问题中可能有一个非常愚蠢的 grep 。道歉,因为这似乎应该是如此简单 - 我显然只是错过了一些东西。

我有一个字符串向量,我们称之为 alice 。下面列出一些 alice

  T.8EFF.SP .OT1.D5.VSVOVA#4 
T.8EFF.SP.OT1.D6.LISOVA#1
T.8EFF.SP.OT1.D6.LISOVA#2
T.8EFF。 SP.OT1.D6.LISOVA#3
T.8EFF.SP.OT1.D6.VSVOVA#4
T.8EFF.SP.OT1.D8.VSVOVA#3
T.8EFF .SP.OT1.D8.VSVOVA#4
T.8MEM.SP#1
T.8MEM.SP#3
T.8MEM.SP.OT1.D106.VSVOVA#2
T.8MEM.SP.OT1.D45.LISOVA#1
T.8MEM.SP.OT1.D45.LISOVA#3

我希望grep能够给出在这些字符串中出现的D之后的数字,条件是包含LIS的字符串和空字符串或其他。



我希望grep能让我知道一个捕获组的价值,而不是整个字符串。这里是我的R风格的正则表达式:

  pattern < - (?<= \\.D)([0 -9] +)(?=。LIS)

没什么太复杂的。但为了得到我所追求的,而不是仅仅使用 grep(pattern,alice,value = TRUE,perl = TRUE)我正在执行以下操作,其中看起来很糟糕:

  reg.out < -  regexpr(
(?<= \\。 D)[0-9] +(?=。LIS),
alice,
perl = TRUE

substr(alice,reg.out,reg.out + attr(reg.out,match.length) - 1)

现在看它,它不会这看起来不太难看,但搞这件事情的琐碎事情令人尴尬。任何有关如何正确使用这些信息的指针?

指向我的网页可以解释我使用 $ @ attr

  pat < - 

* \\.D([0-9] +)\\.LIS。*
sub(pat,\\1,alice)
alice
的子集,试试这个:$ c


$ b

  pat < - 。* \\.D([0-9] +)\\。 LIS。*
sub(pat,\\ 1,alice [grepl(pat,alice)])


I have what is probably a really dumb grep in R question. Apologies, because this seems like it should be so easy - I'm obviously just missing something.

I have a vector of strings, let's call it alice. Some of alice is printed out below:

T.8EFF.SP.OT1.D5.VSVOVA#4   
T.8EFF.SP.OT1.D6.LISOVA#1  
T.8EFF.SP.OT1.D6.LISOVA#2   
T.8EFF.SP.OT1.D6.LISOVA#3  
T.8EFF.SP.OT1.D6.VSVOVA#4    
T.8EFF.SP.OT1.D8.VSVOVA#3  
T.8EFF.SP.OT1.D8.VSVOVA#4   
T.8MEM.SP#1                
T.8MEM.SP#3                      
T.8MEM.SP.OT1.D106.VSVOVA#2 
T.8MEM.SP.OT1.D45.LISOVA#1  
T.8MEM.SP.OT1.D45.LISOVA#3

I'd like grep to give me the number after the D that appears in some of these strings, conditional on the string containing "LIS" and an empty string or something otherwise.

I was hoping that grep would return me the value of a capturing group rather than the whole string. Here's my R-flavoured regexp:

pattern <- (?<=\\.D)([0-9]+)(?=.LIS)

nothing too complicated. But in order to get what I'm after, rather than just using grep(pattern, alice, value = TRUE, perl = TRUE) I'm doing the following, which seems bad:

reg.out <- regexpr(
    "(?<=\\.D)[0-9]+(?=.LIS)",
    alice,
    perl=TRUE
)
substr(alice,reg.out,reg.out + attr(reg.out,"match.length")-1)

Looking at it now it doesn't seem too ugly, but the amount of messing about it's taken to get this utterly trivial thing working has been embarrassing. Anyone any pointers about how to go about this properly?

Bonus marks for pointing me to a webpage that explains the difference between whatever I access with $,@ and attr.

解决方案

You can do something like this:

pat <- ".*\\.D([0-9]+)\\.LIS.*"
sub(pat, "\\1", alice)

If you only want the subset of alice where your pattern matches, try this:

pat <- ".*\\.D([0-9]+)\\.LIS.*"
sub(pat, "\\1", alice[grepl(pat, alice)])

这篇关于R:如何让grep返回匹配,而不是整个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆