删除字符串中连续重复的单词 [英] Removing consecutive duplicate words in a string

查看:87
本文介绍了删除字符串中连续重复的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个删除字符串中连续重复单词的函数.保留正则表达式找到的任何匹配项至关重要.换句话说...

I am trying to write a function that removes consecutive duplicate words within a string. It's vital that one any matches found by the regular expression remains. In other words...

一只非常非常非常脏的狗

A very very very dirty dog

应该变成……

一只很脏的狗

我有一个似乎运行良好的正则表达式(基于这篇文章)

I have a regular expression that seems to work well (based on this post)

(\b\S+\b)(($|\s+)\1)+

但是我不确定如何使用 preg_replace (或者是否有更好的功能)来实现这一点.现在我让它删除所有匹配的重复单词而不保留一个完整的单词副本.我可以解析变量或特殊指令以保持匹配吗?

However I'm not sure how to use preg_replace (or if there's a better function) to implement this. Right now I have it deleting all matching repeated words without leaving one copy of the word intact. Can I parse a variable or special instruction to it to keep a match ?

我目前有这个...

$string=preg_replace('/(\b\S+\b)(($|\s+)\1)+/', '', $string);

推荐答案

你可以使用像 \b(\S+)(?:\s+\1\b)+ 这样的正则表达式并替换为$1:

You may use a regex like \b(\S+)(?:\s+\1\b)+ and replace with $1:

$string=preg_replace('/\b(\S+)(?:\s+\1\b)+/i', '$1', $string);

查看正则表达式演示

详情:

  • \b(\S+) - 第 1 组捕获一个或多个以单词边界开头的非空白符号(可能 \b(\w+)更适合这里)
  • (?:\s+\1\b)+ - 1 个或多个序列:
    • \s+ - 1 个或多个空格
    • \1\b - 对存储在 Group 1 缓冲区中的值的反向引用(该值必须是一个完整的单词)
    • \b(\S+) - Group 1 capturing one or more non-whitespace symbols that are preceded with a word boundary (maybe \b(\w+) would suit better here)
    • (?:\s+\1\b)+ - 1 or more sequences of:
      • \s+ - 1 or more whitespaces
      • \1\b - a backreference to the value stored in Group 1 buffer (the value must be a whole word)

      替换模式是 $1,替换后向引用引用存储在 Group 1 缓冲区中的值.

      The replacement pattern is $1, the replacement backreference that refers to the value stored in Group 1 buffer.

      注意/i不区分大小写修饰符将使\1不区分大小写,而I have a dog Dog DOG将导致我有一只狗.

      Note that /i case insensitive modifier will make \1 case insensitive, and I have a dog Dog DOG will result in I have a dog.

      这篇关于删除字符串中连续重复的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆