stringr str_extract 捕获组捕获所有内容 [英] stringr str_extract capture group capturing everything
问题描述
我想从字符串中提取年份.这总是在X"之后和."之前.然后是一串其他字符.
I'm looking to extract the year from a string. This always comes after an 'X' and before "." then a string of other characters.
使用 stringr
的 str_extract
我正在尝试以下操作:
Using stringr
's str_extract
I'm trying the following:
year = str_extract(string = 'X2015.XML.Outgoing.pounds..millions.'
, pattern = 'X(\\d{4})\\.')
我以为括号会定义捕获组,返回2015
,但实际上我得到了完整的匹配X2015.
I thought the brackets would define the capture group, returning 2015
, but I actually get the complete match X2015.
我这样做正确吗?为什么我不修剪X"和."?
Am I doing this correctly? Why am i not trimming "X" and "."?
推荐答案
在这种情况下,捕获组无关紧要.函数 str_extract
将返回整个匹配,包括捕获组前后的字符.
The capture group is irrelevant in this case. The function str_extract
will return the whole match including characters before and after the capture group.
您必须改为使用后视和前视.它们的长度为零.
You have to work with lookbehind and lookahead instead. Their length is zero.
library(stringr)
str_extract(string = 'X2015.XML.Outgoing.pounds..millions.',
pattern = '(?<=X)\\d{4}(?=\\.)')
# [1] "2015"
此正则表达式匹配以 X
开头且后跟 .
的四个连续数字.
This regex matches four consecutive digits that are preceded by an X
and followed by a .
.
这篇关于stringr str_extract 捕获组捕获所有内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!