如何提取模式"_"之间的子字符串和“."在R中 [英] How to extract substring between patterns "_" and "." in R
本文介绍了如何提取模式"_"之间的子字符串和“."在R中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有很多文件名,如下所示:
I have many filenames which look like:
txt= "MA0051_IRF2.xml"
我要提取在"_"和."之间的IRF2
.如何在R中执行此操作?
I want to extract IRF2
which is between "_" and ".". How do I do this in R?
推荐答案
要实现这一点,您需要一个正则表达式
To achieve this, you need a regexp that
- 匹配_前面的(可选)任意字符串:
.*
- 匹配文字_:
[_]
- 匹配直到(但不包括)next的所有内容.并将其存储在捕获组编号中. 1:
([^.]+)
- 与文字匹配. :
[.]
- 匹配后的(可选)任意字符串. :
.*
- matches an (optional) arbitrary string in front of the _ :
.*
- matches a literal _ :
[_]
- matches everything up to (but not including) the next . and stores it in capturing group no. 1 :
([^.]+)
- matches a literal . :
[.]
- matches an (optional) arbitrary string after the . :
.*
然后在您致电gsub时,
In your call to gsub, you then
- 使用我们在上一步中构建的正则表达式
- 用第一个捕获组的内容替换整个字符串:
\\1
(我们需要转义反斜杠,因此要转义两个反斜杠)
- use the regular expression we built in the previous step
- replace the whole string with the contents of the first capturing group:
\\1
(we need to escape the backslash, hence the double backslash)
示例:
gsub(".*[_]([^.]+)[.].*", "\\1", "MA0051_IRF2.xml")
这篇关于如何提取模式"_"之间的子字符串和“."在R中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文