如何使用 R 删除字符串中的重复字符? [英] How can I remove repeated characters in a string with R?

查看:69
本文介绍了如何使用 R 删除字符串中的重复字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用 R 实现一个删除字符串中重复字符的函数.例如,假设我的函数名为 removeRS,所以它应该以这种方式工作:

I would like to implement a function with R that removes repeated characters in a string. For instance, say my function is named removeRS, so it is supposed to work this way:

  removeRS('Buenaaaaaaaaa Suerrrrte')
  Buena Suerte
  removeRS('Hoy estoy tristeeeeeee')
  Hoy estoy triste

我的函数将用于用西班牙语书写的字符串,因此查找具有三个以上连续元音的单词并不常见(或至少是正确的).不用担心他们背后可能的情绪.尽管如此,有些单词可以有两个连续的辅音(尤其是 ll 和 rr),但我们可以从我们的函数中跳过这一点.

My function is going to be used with strings written in spanish, so it is not that common (or at least correct) to find words that have more than three successive vowels. No bother about the possible sentiment behind them. Nonetheless, there are words that can have two successive consonants (especially ll and rr), but we could skip this from our function.

所以,总而言之,这个函数应该用那个字母替换连续出现至少 3 次的字母.在上述示例之一中,aaaaaaaaa 被替换为 a.

So, to sum up, this function should replace the letters that appear at least three times in a row with just that letter. In one of the examples above, aaaaaaaaa is replaced with a.

你能给我一些关于用 R 执行这个任务的提示吗?

Could you give me any hints to carry out this task with R?

推荐答案

我没有仔细考虑这个问题,但这是我在正则表达式中使用引用的快速解决方案:

I did not think very carefully on this, but this is my quick solution using references in regular expressions:

gsub('([[:alpha:]])\\1+', '\\1', 'Buenaaaaaaaaa Suerrrrte')
# [1] "Buena Suerte"

()先捕获一个字母,\\1指那个字母,+表示匹配一次或多次;将所有这些部分放在一起,我们可以将一个字母匹配两次或更多次.

() captures a letter first, \\1 refers to that letter, + means to match it once or more; put all these pieces together, we can match a letter two or more times.

要包含字母数字以外的其他字符,请将 [[:alpha:]] 替换为与您希望包含的任何内容匹配的正则表达式.

To include other characters besides alphanumerics, replace [[:alpha:]] with a regex matching whatever you wish to include.

这篇关于如何使用 R 删除字符串中的重复字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆