使用 R 查找字符串中的重复模式 [英] Find repeated pattern in a string of characters using R
问题描述
我有一个包含表达式的大文本,例如:"aaaahahahahaha that was a good chance".
处理后,我希望 "aaaaahahahaha"
消失,或者至少,将其更改为简单的 "ha"
.
I have a large text that contains expressions such as: "aaaahahahahaha that was a good joke".
after processing, I want the "aaaaahahahaha"
to disappear, or at least, change it to simply "ha"
.
目前,我正在使用这个:
At the moment, I am using this:
gsub('(.+?)\\1', '', str)
当带有模式的字符串位于句子的开头时,此方法有效,但不包括 where 位于其他任何地方.所以:
This works when the string with the pattern is at the beginning of the sentence, but not where is located anywhere else. So:
str <- "aaaahahahahaha that was a good joke"
gsub('(.+?)\\1', '', str)
#[1] "ha that was a good joke"`
但是
str <- "that was aaaahahahahaha a good joke"
gsub('(.+?)\\1', '', str)
#[1] "that was aaaahahahahaha a good joke"
这个问题可能与此有关:找到重复的模式python,但我在 R 中找不到等价物.
This question might relate to this: find repeated pattern in python, but I can't find the equivalence in R.
我假设很简单,也许我遗漏了一些微不足道的东西,但是由于正则表达式不是我的强项,而且我已经尝试了很多不起作用的东西,我想知道是否有人可以帮助我.问题是:如何在 R 中查找和替换字符串中重复的模式?
I am assuming is very simple and perhaps I am missing something trivial, but since regular expressions are not my strength and I have already tried a bunch of things that have not worked, I was wondering if someone could help me. The question is: How to find, and substitute, repeated patterns in a string of characters in R?
提前感谢您的时间.
推荐答案
\b(\S+?)\1\S*\b
使用这个.看演示.
https://regex101.com/r/sJ9gM7/46
对于 r
使用 \\b(\\S+?)\\1\\S*\\b
和 perl=TRUE
选项.
For r
use \\b(\\S+?)\\1\\S*\\b
with perl=TRUE
option.
这篇关于使用 R 查找字符串中的重复模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!