R 正则表达式 Lookbehind [英] R Regular Expression Lookbehind

查看:12
本文介绍了R 正则表达式 Lookbehind的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用以下格式的字符串填充的向量:<year1><year2><id1><id2>

I have a vector filled with strings of the following format: <year1><year2><id1><id2>

向量的第一个条目如下所示:

the first entries of the vector looks like this:

199719982001
199719982002
199719982003
199719982003

对于第一个条目,我们有:year1 = 1997, year2 = 1998, id1 = 2, id2 = 001.

For the first entry we have: year1 = 1997, year2 = 1998, id1 = 2, id2 = 001.

我想写一个正则表达式来提取year1、id1和id2的不为零的数字.所以对于第一个条目,正则表达式应该输出:199721.

I want to write a regular expression that pulls out year1, id1, and the digits of id2 that are not zero. So for the first entry the regex should output: 199721.

我已尝试使用 stringr 包执行此操作,并创建了以下正则表达式:

I have tried doing this with the stringr package, and created the following regex:

"^\d{4}|\d{1}(?<=\d{3}$)"

要提取 year1 和 id1,但是当使用后向显示时,我得到一个无效的正则表达式"错误.这让我有点不解,R不能处理前瞻和后瞻吗?

to pull out year1 and id1, however when using the lookbehind i get a "invalid regular expression" error. This is a bit puzzling to me, can R not handle lookaheads and lookbehinds?

推荐答案

既然是固定格式,为什么不用substr呢?year1 使用 substr(s,1,4) 提取,id1 使用 substr(s,9,9)id2 作为 as.numeric(substr(s,10,13)).在最后一种情况下,我使用 as.numeric 来消除零.

Since this is fixed format, why not use substr? year1 is extracted using substr(s,1,4), id1 is extracted using substr(s,9,9) and the id2 as as.numeric(substr(s,10,13)). In the last case I used as.numeric to get rid of the zeroes.

这篇关于R 正则表达式 Lookbehind的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆