stringr str_extract 捕获组捕获所有内容 [英] stringr str_extract capture group capturing everything

查看:43
本文介绍了stringr str_extract 捕获组捕获所有内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从字符串中提取年份.这总是在X"之后和."之前.然后是一串其他字符.

I'm looking to extract the year from a string. This always comes after an 'X' and before "." then a string of other characters.

使用 stringrstr_extract 我正在尝试以下操作:

Using stringr's str_extract I'm trying the following:

year = str_extract(string = 'X2015.XML.Outgoing.pounds..millions.'
                 , pattern = 'X(\\d{4})\\.')

我以为括号会定义捕获组,返回2015,但实际上我得到了完整的匹配X2015.

I thought the brackets would define the capture group, returning 2015, but I actually get the complete match X2015.

我这样做正确吗?为什么我不修剪X"和."?

Am I doing this correctly? Why am i not trimming "X" and "."?

推荐答案

在这种情况下,捕获组无关紧要.函数 str_extract 将返回整个匹配,包括捕获组前后的字符.

The capture group is irrelevant in this case. The function str_extract will return the whole match including characters before and after the capture group.

您必须改为使用后视和前视.它们的长度为零.

You have to work with lookbehind and lookahead instead. Their length is zero.

library(stringr)
str_extract(string = 'X2015.XML.Outgoing.pounds..millions.',
            pattern = '(?<=X)\\d{4}(?=\\.)')
# [1] "2015"

此正则表达式匹配以 X 开头且后跟 . 的四个连续数字.

This regex matches four consecutive digits that are preceded by an X and followed by a ..

这篇关于stringr str_extract 捕获组捕获所有内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆