R中的gsub和regex遇到问题 [英] Trouble with gsub and regex in R

查看:117
本文介绍了R中的gsub和regex遇到问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中使用gsub将文本添加到字符串的中间.它可以完美工作,但是由于某些原因,当位置太长时会引发错误.代码如下:

gsub(paste0('^(.{', as.integer(loc[1])-1, '})(.+)$'), new_cols, sql)

Error in gsub(paste0("^(.{273})(.+)$"), new_cols, sql) :  invalid
  regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}'

当括号中的数字较小(在这种情况下为273)时,此代码工作正常,但当括号中的数字较大时,则无效.


这会产生错误:

sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."  
new_cols <- "happy" 
gsub('^(.{125})(.+)$', new_cols, sql)  #**Works
gsub('^(.{273})(.+)$', new_cols, sql) 

Error in gsub("^(.{273})(.+)$", new_cols, sql) :    invalid regular
  expression '^(.{273})(.+)$', reason 'Invalid contents of {}'

解决方案

背景

R gsub默认情况下使用TRE正则表达式库.限制量词中的边界从0到TRE代码中定义的RE_DUP_MAX有效.请参阅此TRE参考:

绑定 是以下其中之一,其中nm0RE_DUP_MAX

之间的无符号十进制整数

RE_DUP_MAX似乎设置为255(请参见 TRE源文件显示#define RE_DUP_MAX 255),因此,您不能在{n,m}限制量词中使用更多内容.

解决方案

使用PCRE regex风味,添加perl = TRUE,它将起作用.

R演示:

> sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."
> new_cols <- "happy"
> gsub('^(.{273})(.+)$', new_cols, sql, perl=TRUE)
[1] "happy"

I am using gsub in R to add text into the middle of a string. It works perfectly but for some reason, when the location gets too long it throws an error. The code is below:

gsub(paste0('^(.{', as.integer(loc[1])-1, '})(.+)$'), new_cols, sql)

Error in gsub(paste0("^(.{273})(.+)$"), new_cols, sql) :  invalid
  regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}'

This code works fine when the number in the brackets(273 in this case) is less but not when it is this large.


This produces the error:

sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."  
new_cols <- "happy" 
gsub('^(.{125})(.+)$', new_cols, sql)  #**Works
gsub('^(.{273})(.+)$', new_cols, sql) 

Error in gsub("^(.{273})(.+)$", new_cols, sql) :    invalid regular
  expression '^(.{273})(.+)$', reason 'Invalid contents of {}'

解决方案

Background

R gsub uses TRE regex library by default. The boundaries in the limiting quantifier are valid from 0 till RE_DUP_MAX that is defined in the TRE code. See this TRE reference:

A bound is one of the following, where n and m are unsigned decimal integers between 0 and RE_DUP_MAX

It seems that the RE_DUP_MAX is set to 255 (see this TRE source file showing #define RE_DUP_MAX 255), and thus, you cannot use more in {n,m} limiting quantifier.

Solution

Use PCRE regex flavor, add perl = TRUE and it will work.

R demo:

> sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."
> new_cols <- "happy"
> gsub('^(.{273})(.+)$', new_cols, sql, perl=TRUE)
[1] "happy"

这篇关于R中的gsub和regex遇到问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆