使用正则表达式匹配或删除在两个字符串中多次出现的字符串 [英] Match or remove string that occurs multiple times within two strings with regex

查看:45
本文介绍了使用正则表达式匹配或删除在两个字符串中多次出现的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的 csv 导出,其中的列没有对齐,因为某些值被意外放入多个单元格而不是一个单元格中.幸运的是,这些值位于两个唯一的字符串之间.我希望使用正则表达式将这些值合并到一个单元格中.样本数据如下:

I have a large csv export where the columns do not align because some values are accidentally put in multiple cells instead of one. Fortunately, the values lay between two unique strings. I am hoping to use regex to merge these values into one cell. Sample data is as follows:

"apple","NULL","0","0","0",",","1",",","fruit","red","sweet","D$","object"
"horse","NULL","0","0","0",",","1",",","animal","large","tail","D$","object"
"Los Angeles","NULL","0","0","0",",","1",","city","California","smoggy","entertainment","D$","location"

未合并的值开始于

"NULL","0","0","0",",","1",",","

并且未合并的值在此之前结束

And the unmerged values end before

","D$"

我试图找出一个正则表达式来删除值之间的,"以合并它们,因此输出看起来像:

I'm trying to figure out a regex that would remove the "," between the values to merge them, so the output would look like:

"apple","NULL","0","0","0",",","1",",","fruit,red,sweet","D$","object"
"horse","NULL","0","0","0",",","1",",","animal,large,tail","D$","object"
"Los Angeles","NULL","0","0","0",",","1",",","city,California,smoggy,entertainment","D$","location"

推荐答案

你可以这样做:

$pattern = '~(?:"NULL","0","0","0",",","1",",","|(?!^)\G)[^"]+\K","(?!D\$)~';
$csv = preg_replace($pattern, ',', $csv);

图案详情:

~             # delimiter
(?:
    "NULL","0","0","0",",","1",",","
  |           
    (?!^)\G   # anchor for the end of the last match
)
[^"]+         # content between quotes
\K            # removes all on the left from match result
","           # ","
(?!D\$)       # not followed by D$
~

模式的想法是使用 \G 锚点,意思是字符串的开始"或最后一个匹配的结束".我添加了 (?!^) 以避免第一种情况.

The idea of the pattern is to use the \G anchors that means "start of the string" or "end of the last match". I added (?!^) to avoid the first case.

"NULL","0","0","0",",","1",","," 用作第一次匹配的入口点.然后匹配引号之间的内容.由于\K把匹配结果左边的都去掉了,所以只替换了",".

"NULL","0","0","0",",","1",","," is used as an entry point for the first match. Then the content between quotes is matched. Since the \K removes all on the left from the match result, only "," is replaced.

接下来的匹配使用 \G 作为入口点并且连续匹配继续直到 (?!D\$) 成功.

The next matches use \G as entry point and the contiguous matches continue until (?!D\$) succeeds.

这篇关于使用正则表达式匹配或删除在两个字符串中多次出现的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆