R:固定为T或F且特殊情况下的gsub [英] R: gsub with fixed=T or F and special cases

查看:50
本文介绍了R:固定为T或F且特殊情况下的gsub的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以我之前问过的两个问题为基础:

Building on top of two questions I previously asked:

gsub速度与模式长度

我喜欢@Tyler使用fixed = TRUE的建议,因为它显着加快了计算速度.但是,它并不总是适用.我需要用caps替代它周围的独立单词w/或w/o标点符号.先验地,它不知道该词之后或之后可以是什么,但是它必须是任何常规的标点符号(,.!-+等).它不能是数字或字母.下面的例子. capsule必须保持原样.

I do like suggestions on usage of fixed=TRUE by @Tyler as it speeds up calculations significantly. However, it's not always applicable. I need to substitute, say, caps as a stand-alone word w/ or w/o punctuation that surrounds it. A priori it's not know what can follow or precede the word, but it must be any of regular punctuation signs (, . ! - + etc). It cannot be a number or a letter. Example below. capsule must stay as is.

i = "Here is the capsule, caps key, and two caps, or two caps. or even three caps-"          

orig = "caps"
change = "cap"

gsub_FixedTrue <- function(i) {
  i = paste0(" ", i, " ")
  orig = paste0(" ", orig, " ")
  change = paste0(" ", change, " ")

  i = gsub(orig,change,i,fixed=TRUE)
  i = gsub("^\\s|\\s$", "", i, perl=TRUE)

  return(i)
}

#Second fastest, doesn't clog memory
gsub_FixedFalse <- function(i) {

  i = gsub(paste0("\\b",orig,"\\b"),change,i)

  return(i)
}

print(gsub_FixedTrue(i)) #wrong
print(gsub_FixedFalse(i)) #correct

结果.需要第二个输出

[1] "Here is the capsule, cap key, and two caps, or two caps. or even three caps-"
[1] "Here is the capsule, cap key, and two cap, or two cap. or even three cap-"

推荐答案

使用上一个问题中的部分进行测试,我认为我们可以按以下方式在标点符号前放置一个占位符,而不必减慢它的速度:

Using parts from your previous question to test I think we can put a place holder in front of punctuation as follows, without slowing it down too much:

line <- c("one", "two one", "four phones", "and a capsule", "But here's a caps key",
    "Here is the capsule, caps key, and two caps, or two caps. or even three caps-" )
e <- c("one", "two", "caps")
r <- c("ONE", "TWO", "cap")


line <- rep(line, 1700000/length(line))

line <- gsub("([[:punct:]])", " <DEL>\\1<DEL> ", line, perl=TRUE)

## Start    
line2 <- paste0(" ", line, " ")
e2 <-  paste0(" ", e, " ")
r2 <- paste0(" ", r, " ")


for (i in seq_along(e2)) {
    line2 <- gsub(e2[i], r2[i], line2, fixed=TRUE)
}

gsub("^\\s|\\s$| <DEL>|<DEL> ", "", line2, perl=TRUE)

这篇关于R:固定为T或F且特殊情况下的gsub的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆