R中的正则表达式:仅替换模式的一部分 [英] Regex in R: replace only part of a pattern

查看：34 发布时间：2021/7/6 19:31:12 regex r

本文介绍了R中的正则表达式:仅替换模式的一部分的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

s <- "YXABCDXABCDYX"

我想使用正则表达式返回ABCDABCD，即中央"X" 的每侧各4 个字符，但不包括"X".请注意，"X" 始终位于中心，每侧有 6 个字母.

我可以找到中心模式，例如"[AZ]{4}X[AZ]{4}"，但是我可以以某种方式让返回成为 "([AZ]{4})(X)([AZ]{4})"?

解决方案

你的正则表达式 "([AZ]{4})(X)([AZ]{4})" 不会匹配您的字符串，因为您在第一个捕获组 ([AZ]{4}) 之前有字符，因此您可以添加 .* 以匹配任何字符 (.) 0 次或更多次 (*) 直到您的第一个捕获组.

您可以引用 gsub 中的组，例如，使用 \\n 其中 n 是第 n 个捕获组

s <- "YXABCDXABCDYX"gsub('.*([A-Z]{4})(X)([A-Z]{4}).*', '\\1\\3', s)# [1] "ABCDABCD"

这基本上匹配整个字符串并将其替换为在组 1 和组 3 中捕获的任何内容并将其粘贴在一起.

另一种方法是使用不区分大小写的 (?i) 与 [az] 或 \\w

gsub('(?i).*(\\w{4})(x)(\\w{4}).*', '\\1\\3', s)# [1] "ABCDABCD"

或者 gsub('.*(.{4})X(.{4}).*', '\\1\\2', s) 如果你喜欢点>

s <- "YXABCDXABCDYX"

I want to use a regular expression to return ABCDABCD, i.e. 4 characters on each side of central "X" but not including the "X". Note that "X" is always in the center with 6 letters on each side.

I can find the central pattern with e.g. "[A-Z]{4}X[A-Z]{4}", but can I somehow let the return be the first and third group in "([A-Z]{4})(X)([A-Z]{4})"?

解决方案

Your regex "([A-Z]{4})(X)([A-Z]{4})" won't match your string since you have characters before the first capture group ([A-Z]{4}), so you can add .* to match any character (.) 0 or more times (*) until your first capture group.

You can reference the groups in gsub, for example, using \\n where n is the nth capture group

s <- "YXABCDXABCDYX"

gsub('.*([A-Z]{4})(X)([A-Z]{4}).*', '\\1\\3', s)
# [1] "ABCDABCD"

which is basically matching the entire string and replacing it with whatever was captured in groups 1 and 3 and pasting that together.

Another way would be to use (?i) which is case-insensitive matching along with [a-z] or \\w

gsub('(?i).*(\\w{4})(x)(\\w{4}).*', '\\1\\3', s)
# [1] "ABCDABCD"

Or gsub('.*(.{4})X(.{4}).*', '\\1\\2', s) if you like dots

这篇关于R中的正则表达式:仅替换模式的一部分的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R中的正则表达式:仅替换模式的一部分 [英] Regex in R: replace only part of a pattern

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R中的正则表达式:仅替换模式的一部分 [英] Regex in R: replace only part of a pattern

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭