在 R 中使用 RegEx 在句点('.')第二次出现之前检索字符串 [英] Use RegEx in R to retrieve string before second occurence of a period ('.')

查看:48
本文介绍了在 R 中使用 RegEx 在句点('.')第二次出现之前检索字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么正则表达式可以检索(例如使用 sup())第二个句点之前的字符.给定一个字符向量,如:

What regular expression can retrieve (e.g. with sup()) the characters before the second period. Given a character vector like:

v <- c("m_s.E1.m_x.R1PE1", "m_xs.P1.m_s.R2E12")

我想退货:

[1] "m_s.E1" "m_xs.P1"

[1] "m_s.E1" "m_xs.P1"

推荐答案

> sub( "(^[^.]+[.][^.]+)(.+$)", "\\1", v)
[1] "m_s.E1"  "m_xs.P1"

现在解释一下:第一和第三对 "[ ]" 中的符号匹配除句点(字符类")和它们后面的+"之外的任何字符让它成为任意数量的此类字符.[.] 因此只匹配第一个句点,第二个句点将终止匹配.括号对允许您匹配字符的特定部分部分,有两个部分.第二部分是任何字符(句点符号)重复任意次数直到字符串结束,$."\\1" 仅指定第一个部分匹配作为返回值.

Now to explain it: The symbols inside the first and third paired "[ ]" match any character except a period ("character classes"), and the "+"'s that follow them let that be an arbitrary number of such characters. The [.] therefore is only matching the first period, and the second period will terminate the match. Parentheses-pairs allow you to specific partial sections of matched characters and there are two sections. The second section is any character (the period symbol) repeated an arbitrary number of times until the end of the string, $. The "\\1" specifies only the first partial match as the returned value.

^ 运算符在方括号内和方括号外表示不同的内容.在它之外是指字符串的长度为零的开头.在字符类规范的开头里面,是否定操作.

The ^ operator means different things inside and outside the square-brackets. Outside it refers to the length-zero beginning of the string. Inside at the beginning of a character class specification, it is the negation operation.

这是帮助页面中描述的字符类"的一个很好的用例:

This is a good use case for "character classes" which are described in the help page found by typing:

?regex

这篇关于在 R 中使用 RegEx 在句点('.')第二次出现之前检索字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆