可以编写与特定模式匹配的正则表达式,然后用模式的一部分替换 [英] Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern

查看:190
本文介绍了可以编写与特定模式匹配的正则表达式,然后用模式的一部分替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用一些逗号分隔的文本文件。该文件包含大约400行和94列,所有逗号分隔和使用双引号:

 H ,YES,NO.... 

我的目的是将文件拆分为其各自的列使用逗号分隔符。不幸的是,行内有几个具有以下格式的字段:

 4,5 8

当解析逗号上的文件时,这些字段破坏了文件的列结构。所以我想做的是使用正则表达式做一些查找和替换,以便我可以成功地解析我的文件。例如:

 H,9,YES,NO,4,5 Y,N将成为:


H,9,YES,NO,4 | 5,Y

,这样当我解析文件时,我会得到7列而不是8列。



我写了一个正则表达式来处理匹配2 ,5或2,3,4 ,但我不知道如何处理替换部分。



正则表达式?



注意:我使用的是perl正则表达式。

解决方案

不要干扰明显的源数据,即引号内的东西,您可以考虑替换字段分隔符逗号:

  s /,([^,] * |[^] *) ,| $))/ | $ 1 / g 

注意,这也处理非引用的字段。 / p>

对此数据:H,9,YES,NO,4,5,Y,N

  $ perl -pe'/([^,] * | ||||4,5|Y||||*)(?=(,| $))/ | $ 1 / g'commasep 
N

其后可以拆分为|:

  $ perl -ne's /,([^,] * |[^] * | $ 1 / g; print join---,split\\ |'commasep 
H--- 9 ---YES---NO--- 4,5---Y---N


I'm working with some comma delimited text files. The file is comprised of approximately 400 rows and 94 columns all comma delimited and withing double quotes:

"H","9","YES","NO"....

My aim is to split the file up to its respective columns using the comma delimiter. Unfortunately, there are several fields within the rows that have the following format:

"4,5"  or "2,5,8"

These fields are corrupting the column structure of the file when parse the file on the comma. So what I'd like to do is use regular expression to to do some sort of find and replace so that I can successfully parse my file. For example:

 "H","9","YES","NO","4,5","Y","N"  would become this:


"H","9","YES","NO","4|5","Y","N"

so that when I parse the file I would get seven columns instead of eight.

I wrote a regular expression that handles matching "2,5" or "2,3,4", but I'm not sure how to handle the replacing part.

Is it possible to accomplish this regular expressions?

Note: I'm using perl regular expressions.

解决方案

Rather than interfere with what is evidently source data, i.e. the stuff inside the quotes, you might consider replacing the field-separator commas instead:

s/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g

Note that this also handles non-quoted fields.

On this data: "H",9,"YES","NO","4,5","Y","N"

$ perl -pe 's/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g' commasep
"H"|9|"YES"|"NO"|"4,5"|"Y"|"N"

Which can afterwards be split on "|":

$ perl -ne 's/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g;print join "---",split "\\|"' commasep
"H"---9---"YES"---"NO"---"4,5"---"Y"---"N"

这篇关于可以编写与特定模式匹配的正则表达式,然后用模式的一部分替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆