如果符合条件,则删除字符串中的最后两个字符 [英] Delete last two characters in string if they match criteria

查看:83
本文介绍了如果符合条件,则删除字符串中的最后两个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在数据库中有200万个名字。例如:

I have 2 million names in a database. For example:

df <- data.frame(names=c("A ADAM", "S BEAN", "A APPLE A", "A SCHWARZENEGGER"))

> df
             names
1           A ADAM
2           S BEAN
3        A APPLE A
4 A SCHWARZENEGGER

如果这是字符串的最后两个字符,我想删除'A'(空格A)。

I want to delete ' A' (white space A) if these are the last two characters of the string.

我知道regex是我们的朋友。如何有效地将正则表达式函数应用于字符串的最后两个字符?

I know that regex is our friend here. How do I efficiently apply a regex function to the last two characters of the string?

所需的输出:

> output
             names
1           A ADAM
2           S BEAN
3          A APPLE
4 A SCHWARZENEGGER


推荐答案

如果您想为数百万条记录提供良好的性能,则 stringi 包是您需要什么。它甚至胜过基本的R函数:

If you want good performance for millions of records, the stringi package is what you need. It even outperforms the base R functions:

require(stringi)
n <- 10000
x <- stri_rand_strings(n, 1:100)
ind <- sample(n, n/100)
x[ind] <- stri_paste(x[ind]," A")

baseR <- function(x){
  sub("\\sA$", "", x)
}

stri1 <- function(x){
  stri_replace_last_regex(x, "\\sA$","")
}

stri2 <- function(x){
  ind <- stri_detect_regex(x, "\\sA$")
  x[ind] <- stri_sub(x[ind],1, -3)
  x
}

#if we assume that there can only be space, not any white character
#this is even faster (ca 200x)
stri3 <- function(x){
  ind <- stri_endswith_fixed(x, " A")
  x[ind] <- stri_sub(x[ind],1, -3)
  x
}


head(stri2(x),44)
require(microbenchmark)
microbenchmark(baseR(x), stri1(x),stri2(x),stri3(x))
Unit: microseconds
     expr        min        lq        mean      median         uq        max neval
 baseR(x) 166044.032 172054.30 183919.6684 183112.1765 194586.231 219207.905   100
 stri1(x)  36704.180  39015.59  41836.8612  40164.9365  43773.034  60373.866   100
 stri2(x)  17736.535  18884.56  20575.3306  19818.2895  21759.489  31846.582   100
 stri3(x)    491.963    802.27    918.1626    868.9935   1008.776   2489.923   100

这篇关于如果符合条件,则删除字符串中的最后两个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆