正则表达式,用于删除逗号周围的空格(带引号时除外) [英] Regular expression to remove whitespace around a comma, except when quoted

查看:343
本文介绍了正则表达式,用于删除逗号周围的空格(带引号时除外)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CSV文件,其中的行类似于以下内容:

I have a CSV file that has rows resembling this:

1,  4,     2, "PUBLIC, JOHN Q" ,ACTIVE , 1332

我正在寻找将与这些行匹配并吐出类似以下内容的正则表达式替换:

I am looking for a regular expression replacement that will match against these rows and spit out something resembling this:

1,4,2,"PUBLIC, JOHN Q",ACTIVE,1332

我认为这很容易:我制作了表达式([ \t]+,)并将其替换为,.我用,代替了补语(,[ \t]+),我认为我已经很好地实现了右修剪和左修剪字符串.

I thought this would be rather easy: I made the expression ([ \t]+,) and replaced it with ,. I made a complement expression (,[ \t]+) with a replacement of , and I thought I had achieved a good means of right-trimming and left-trimming strings.

...但是后来我注意到我的"PUBLIC, JOHN Q"现在是"PUBLIC,JOHN Q",这不是我想要的. (请注意,逗号后面的空格现在消失了.)

...but then I noticed that my "PUBLIC, JOHN Q" was now "PUBLIC,JOHN Q" which isn't what I wanted. (Note the space following the comma is now gone).

什么是合适的表达方式,以在逗号前后修剪空白,但使引用的文字保持不变?

What would be the appropriate expression to trim the white space before and after a comma, but leave quoted text untouched?

更新

为澄清起见,我正在使用一个应用程序来处理文件.这个应用程序允许我定义多个正则表达式替换;它不提供解析功能.尽管这可能不是理想的机制,但肯定会为该文件制作另一个应用程序.

To clarify, I am using an application to handle the file. This application allows me to define multiple regular expression replacements; it does not provide a parsing capability. While this may not be the ideal mechanism for this, it would sure beat making another application for this one file.

推荐答案

如果您的工具使用的引擎是C#正则表达式引擎,那么您可以尝试以下表达式:

If the engine used by your tool is the C# regular expression engine, then you can try the following expression:

(?<!,\s*"(?:[^\\"]|\\")*)\s+(?!(?:[^\\"]|\\")*"\s*,)

替换为空字符串.

伙计们的答案假设报价是平衡的,并使用计数来确定空格是否是报价值的一部分.

The guys answers assumed the quotes are balanced and used counting to determine if the space is part of a quoted value or not.

我的表达式查找不属于引号值的所有空格.

My expression looks for all spaces that are not part of a quoted value.

RegexHero演示

这篇关于正则表达式,用于删除逗号周围的空格(带引号时除外)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆