用于格式化文件的正则表达式 [英] Regular Expression for formatting a file

查看:87
本文介绍了用于格式化文件的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的文件包含以特定模式开头的每行的数据



 1000000179 | abcd ..... 
1000000180 | wedwedw ...
1000000181 | wnewedwed ...







有10个数字后跟管道。



如何查找/替换没有此模式的行..例如..下面的第二行无效





 1000000179 | abcd ..... 
%d20000180 | wedwedw ...
1000000181 | wnewedwed ...

解决方案

您需要的是负向前瞻。

表达式为: ^(?!\d {10})。{10} \ |。*


< blockquote>
(使用MultiLine选项)。

请注意它的逻辑:第一个公式是前瞻,检查与十位数不匹配的子串图案。其余的是一般模式,允许管道位于第11位的好的和坏的字符串。

有关环视的更多详细信息,请参阅本文 [ ^ ]。


这里的挑战,如:莫汉暗示,正在寻找负面的比赛。找到一个以10位数字和一根烟斗开头的线条是直截了当的,但你怎么找到一条不行的呢?



Here''sa将要执行此操作的正则表达式:

 ^([^ |] * [^ | \d] [^ |] * \ ||。{10} [ ^ |])



它仅匹配这些输入行的最后4个:

 1000000179 | abcd。 .... | ABC | 
1000000180 | wedwedw ... | 234 |
1000000179 | abcd .....
%d20000180 | wedwedw ...
3214a23642 | abcd
123456789 |无论
1234567890_abcde





打破正则表达式,它正在寻找一行开头的两个条件之一:

1)任何非第一个管道前的数字( [^ |] * [^ | \d] [^ |] * \ |

2)任何非管道字符第11位(。{10} [^ |]


My file has data with each line starting with a specific pattern

1000000179|abcd.....
1000000180|wedwedw...
1000000181|wnewedwed...




there are 10 numerals followed by a pipe.

How to find/replace lines that DO NOT have this pattern.. Eg.. the second line below is invalid


1000000179|abcd.....
%d20000180|wedwedw...
1000000181|wnewedwed...

解决方案

What you need is called negative lookahead.
The expression will be this one: ^(?!\d{10}).{10}\|.*


(with MultiLine option).
Please note it''s logic: the first formula is a lookahead that checks for substrings that do not match the "ten digits" pattern. The rest is a general pattern that allows both good and bad strings with the pipe at the 11th position.
For more details about lookaround, read this article[^].


The challenge here, as Mohan implied, is finding the "negative" match. It''s straightforward to find a line that starts with 10 digits and a pipe, but how do you find one that doesn''t?

Here''s a regex that will do it:

^([^|]*[^|\d][^|]*\||.{10}[^|])


It matches only the last 4 of these input lines:

1000000179|abcd.....|abc|
1000000180|wedwedw...|234|
1000000179|abcd.....
%d20000180|wedwedw...
3214a23642|abcd
123456789|whatever
1234567890_abcde



Breaking down the regex, it''s looking for one of two conditions at the beginning of a line:
1) Any non-digit before the first pipe ( [^|]*[^|\d][^|]*\| )
2) Any non-pipe character in the 11th position ( .{10}[^|] )


这篇关于用于格式化文件的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆