正则表达式,用于删除逗号周围的空格(带引号时除外) [英] Regular expression to remove whitespace around a comma, except when quoted
问题描述
我有一个CSV文件,其中的行类似于以下内容:
I have a CSV file that has rows resembling this:
1, 4, 2, "PUBLIC, JOHN Q" ,ACTIVE , 1332
我正在寻找将与这些行匹配并吐出类似以下内容的正则表达式替换:
I am looking for a regular expression replacement that will match against these rows and spit out something resembling this:
1,4,2,"PUBLIC, JOHN Q",ACTIVE,1332
我认为这很容易:我制作了表达式([ \t]+,)
并将其替换为,
.我用,
代替了补语(,[ \t]+)
,我认为我已经很好地实现了右修剪和左修剪字符串.
I thought this would be rather easy: I made the expression ([ \t]+,)
and replaced it with ,
. I made a complement expression (,[ \t]+)
with a replacement of ,
and I thought I had achieved a good means of right-trimming and left-trimming strings.
...但是后来我注意到我的"PUBLIC, JOHN Q"
现在是"PUBLIC,JOHN Q"
,这不是我想要的. (请注意,逗号后面的空格现在消失了.)
...but then I noticed that my "PUBLIC, JOHN Q"
was now "PUBLIC,JOHN Q"
which isn't what I wanted. (Note the space following the comma is now gone).
什么是合适的表达方式,以在逗号前后修剪空白,但使引用的文字保持不变?
What would be the appropriate expression to trim the white space before and after a comma, but leave quoted text untouched?
更新
为澄清起见,我正在使用一个应用程序来处理文件.这个应用程序允许我定义多个正则表达式替换;它不提供解析功能.尽管这可能不是理想的机制,但肯定会为该文件制作另一个应用程序.
To clarify, I am using an application to handle the file. This application allows me to define multiple regular expression replacements; it does not provide a parsing capability. While this may not be the ideal mechanism for this, it would sure beat making another application for this one file.
推荐答案
如果您的工具使用的引擎是C#正则表达式引擎,那么您可以尝试以下表达式:
If the engine used by your tool is the C# regular expression engine, then you can try the following expression:
(?<!,\s*"(?:[^\\"]|\\")*)\s+(?!(?:[^\\"]|\\")*"\s*,)
替换为空字符串.
伙计们的答案假设报价是平衡的,并使用计数来确定空格是否是报价值的一部分.
The guys answers assumed the quotes are balanced and used counting to determine if the space is part of a quoted value or not.
我的表达式查找不属于引号值的所有空格.
My expression looks for all spaces that are not part of a quoted value.
这篇关于正则表达式,用于删除逗号周围的空格(带引号时除外)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!