R中的反向引用 [英] Backreference in R

查看:131
本文介绍了R中的反向引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用反向引用感到非常困惑

I got really confused about the usage of backreferences

strings <- c("^ab", "ab", "abc", "abd", "abe", "ab 12")

gsub("(ab) 12", "\\1 34", strings)
[1] "^ab"   "ab"    "abc"   "abd"   "abe"   "ab 34"

gsub("(ab)12", "\\2 34", strings)
[1] "^ab"   "ab"    "abc"   "abd"   "abe"   "ab 12"

我知道\ 1表示第一个子模式(从左侧读取),\ 2表示第二个子模式,依此类推.但是我不知道这个子模式是什么意思.为什么\ 1和\ 2给出不同的输出

I know \1 refers to the first subpattern (reading from the left), \2 refers to the second subpattern, and so on. But I dont know what this subpattern means. Why \1 and \2 give different output

gsub("(ab)", "\\1 34", strings)
[1] "^ab 34"   "ab 34"    "ab 34c"   "ab 34d"   "ab 34e"   "ab 34 12"

此外,为什么我在(ab)之后删除12,然后得到这样的结果?

Also, why I remove 12 after (ab) then it gives such result?

gsub("ab", "\\1 34", strings)
[1] "^ 34"   " 34"    " 34c"   " 34d"   " 34e"   " 34 12"

此外,如果ab没有括号怎么办?它表示什么?

Furthermore, what if ab has no parenthesis? What does it indicate?

我真的搞砸了反向引用,希望有人可以清楚地解释逻辑

I really messed up with backreference and hope someone could explain the logic clearly

推荐答案

在第一种和第二种情况下,只有一个捕获组,即使用(...)捕获的组,但是在第一种情况下,我们使用后向引用正确,即第一个捕获组,在第二种情况下,使用了不存在的\\2.

In the first and second case, there is a single capture group i.e. groups that are captured using (...), however in the first case replacement we use the backreference correctly i.e. the first capture group and in the second case, used \\2 which never existed.

说明一下

gsub("(ab)(d)", "\\1 34", strings)
#[1] "^ab"   "ab"    "abc"   "ab 34" "abe"   "ab 12"

这里我们使用两个捕获组((ab)(d)),在替换组中,我们有第一个反向引用(\\1),后跟一个空格,后跟34.因此,在字符串"中,这将匹配第四个元素,即"abd",对于第一个反向引用(\\1),请获取"ab",后跟一个空格和34.

here we are using two capture groups ((ab) and (d)), in the replacement we have first backreference (\\1) followed by a space followed by 34. So, in 'strings' this will match the 4th element i.e. "abd", get "ab" for the first backreference (\\1) followed by a space and 34.

假设,我们使用第二个反向引用

Suppose, we do with the second backreference

gsub("(ab)(d)", "\\2 34", strings)
#[1] "^ab"   "ab"    "abc"   "d 34"  "abe"   "ab 12"

第一个被删除,我们有"d",后跟空格和34.

the first one is removed and we have "d" followed by space and 34.

假设,我们使用的是一般情况,而不是特定的字符

Suppose, we are using a general case instead of specific characters

gsub("([a-z]+)\\s*(\\d+)", "\\1 34", strings)
#[1] "^ab"   "ab"    "abc"   "abd"   "abe"   "ab 34"
gsub("([a-z]+)\\s*(\\d+)", "\\2 34", strings)
#[1] "^ab"   "ab"    "abc"   "abd"   "abe"   "12 34"

请注意如何通过从第一个反向引用切换到第二个反向引用来更改最后一个元素中的值.所使用的模式是一个或多个小写字母(在捕获组(([a-z]+))内,然后是零个或多个空格(\\s*),然后是第二个捕获组((\\d+))中的一个或多个数字(此匹配仅使用字符串"的最后一个元素.)在替换中,我们使用上面显示的第一个和第二个反向引用.

Note how the values are changed in the last element by switching from first backreference to second. The pattern used is one or more lower case letters (inside the capture group (([a-z]+)) followed by zero or more space (\\s*) followed by one or more numbers in the second capture group ((\\d+)) (this matches only with the last element of 'strings'). In the replacement, we use the first and second backreference as showed above.

这篇关于R中的反向引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆