用于格式化文件的正则表达式 [英] Regular Expression for formatting a file
问题描述
我的文件包含以特定模式开头的每行的数据
1000000179 | abcd .....
1000000180 | wedwedw ...
1000000181 | wnewedwed ...
有10个数字后跟管道。
如何查找/替换没有此模式的行..例如..下面的第二行无效
1000000179 | abcd .....
%d20000180 | wedwedw ...
1000000181 | wnewedwed ...
您需要的是负向前瞻。
表达式为:^(?!\d {10})。{10} \ |。*
< blockquote> (使用MultiLine选项)。
请注意它的逻辑:第一个公式是前瞻,检查与十位数不匹配的子串图案。其余的是一般模式,允许管道位于第11位的好的和坏的字符串。
有关环视的更多详细信息,请参阅本文 [ ^ ]。
这里的挑战,如:莫汉暗示,正在寻找负面的比赛。找到一个以10位数字和一根烟斗开头的线条是直截了当的,但你怎么找到一条不行的呢?
Here''sa将要执行此操作的正则表达式:
^([^ |] * [^ | \d] [^ |] * \ ||。{10} [ ^ |])
它仅匹配这些输入行的最后4个:
1000000179 | abcd。 .... | ABC |
1000000180 | wedwedw ... | 234 |
1000000179 | abcd .....
%d20000180 | wedwedw ...
3214a23642 | abcd
123456789 |无论
1234567890_abcde
打破正则表达式,它正在寻找一行开头的两个条件之一:
1)任何非第一个管道前的数字( [^ |] * [^ | \d] [^ |] * \ | )
2)任何非管道字符第11位(。{10} [^ |] )
My file has data with each line starting with a specific pattern
1000000179|abcd.....
1000000180|wedwedw...
1000000181|wnewedwed...
there are 10 numerals followed by a pipe.
How to find/replace lines that DO NOT have this pattern.. Eg.. the second line below is invalid
1000000179|abcd.....
%d20000180|wedwedw...
1000000181|wnewedwed...
What you need is called negative lookahead.
The expression will be this one:^(?!\d{10}).{10}\|.*
(with MultiLine option).
Please note it''s logic: the first formula is a lookahead that checks for substrings that do not match the "ten digits" pattern. The rest is a general pattern that allows both good and bad strings with the pipe at the 11th position.
For more details about lookaround, read this article[^].
The challenge here, as Mohan implied, is finding the "negative" match. It''s straightforward to find a line that starts with 10 digits and a pipe, but how do you find one that doesn''t?
Here''s a regex that will do it:
^([^|]*[^|\d][^|]*\||.{10}[^|])
It matches only the last 4 of these input lines:
1000000179|abcd.....|abc| 1000000180|wedwedw...|234| 1000000179|abcd..... %d20000180|wedwedw... 3214a23642|abcd 123456789|whatever 1234567890_abcde
Breaking down the regex, it''s looking for one of two conditions at the beginning of a line:
1) Any non-digit before the first pipe ( [^|]*[^|\d][^|]*\| )
2) Any non-pipe character in the 11th position ( .{10}[^|] )
这篇关于用于格式化文件的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!