在CSV文件中查找和替换未转义的非连续双引号的正则表达式 [英] Regular expression to find and replace unescaped Non-successive double quotes in CSV file

查看:169
本文介绍了在CSV文件中查找和替换未转义的非连续双引号的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是相关问题的扩展回答
这里



我有一个需要解析的每周csv文件。看起来像这样。



asdf,asdf,asdf,asdf p>

但是有时还有一些文本字段包含一个这样的额外的非转义双引号字符串



asdf,as某事df,asdf,asdf



从这里的其他帖子,能够组合正则表达式

 (?m)(?![\t] *(,$) )

它匹配两个连续的双引号,只有如果他们没有逗号或结尾之前的空格和制表符之间的空格和制表符之间



现在这只发现双引号连续。如何修改它以查找和替换/删除文件中某事的双引号?



谢谢。

解决方案

 (?<!^ |,)(?!,$)

将匹配不在前面或后面加上逗号的双引号,也可以位于开头/末尾。



如果您需要在逗号周围或开始/行尾允许空格,并且如果您的正则表达式(您没有指定)允许任意长度的lookbehind(.NET)示例),您可以使用

 (?<!^ \s * |,\s *) !\s *,| \s * $)


This is an extension to a related question answered Here

I have a weekly csv file which needs to be parsed. it looks like this.

"asdf","asdf","asdf","asdf"

But sometimes there are text fields which contain an extra unescaped double quote string like this

"asdf","as "something" df","asdf","asdf"

From the other posts on here, I was able to put together a regex

(?m)""(?![ \t]*(,|$))

which matches two successive double quotes, only "if they DON'T have a comma or end-of-the-line ahead of them with optionally spaces and tabs in between"

now this finds only double quotes in succession. How do i modify it to find and replace/delete the double quotes around "something" in the file?

thanks.

解决方案

(?<!^|,)"(?!,|$)

will match a double quote that is not preceded or followed by a comma nor situated at start/end of line.

If you need to allow whitespace around the commas or at start/end-of-line, and if your regex flavor (which you didn't specify) allows arbitrary-length lookbehind (.NET does, for example), you can use

(?<!^\s*|,\s*)"(?!\s*,|\s*$)

这篇关于在CSV文件中查找和替换未转义的非连续双引号的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆