用于查找和替换未转义的正则表达式CSV文件中的非连续双引号 [英] Regular expression to find and replace unescaped Non-successive double quotes in CSV file

查看:168
本文介绍了用于查找和替换未转义的正则表达式CSV文件中的非连续双引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是对相关问题的扩展回答
这里



我有一个每周的csv文件,需要解析。它看起来像这样。



asdf,asdf,asdf,asdf p>

但有时,文本字段包含额外的非转义双引号字符串,如下所示



asdf,assomethingdf,asdf,asdf



能够组合一个正则表达式

 (?m)(?![\t] *(,| $) )

它匹配两个连续的双引号,只有如果他们没有逗号或end-在它们之前有可选的空格和制表符,位于



之间,现在只能连续找到双引号。如何修改它以查找和替换/删除文件中something周围的双引号?



谢谢。

解决方案

 (?<!^ |,)(?!,| $)

将匹配不在逗号之前或之后的双引号,也不位于行的开始/结尾。



如果你需要在逗号周围或在开始/结束处允许空格,如果你的正则表达式(你没有指定)允许任意长度的lookbehind(.NET does,for示例),您可以使用

 (?<!^ \s * |,\s *) !\s *,| \s * $)


This is an extension to a related question answered Here

I have a weekly csv file which needs to be parsed. it looks like this.

"asdf","asdf","asdf","asdf"

But sometimes there are text fields which contain an extra unescaped double quote string like this

"asdf","as "something" df","asdf","asdf"

From the other posts on here, I was able to put together a regex

(?m)""(?![ \t]*(,|$))

which matches two successive double quotes, only "if they DON'T have a comma or end-of-the-line ahead of them with optionally spaces and tabs in between"

now this finds only double quotes in succession. How do i modify it to find and replace/delete the double quotes around "something" in the file?

thanks.

解决方案

(?<!^|,)"(?!,|$)

will match a double quote that is not preceded or followed by a comma nor situated at start/end of line.

If you need to allow whitespace around the commas or at start/end-of-line, and if your regex flavor (which you didn't specify) allows arbitrary-length lookbehind (.NET does, for example), you can use

(?<!^\s*|,\s*)"(?!\s*,|\s*$)

这篇关于用于查找和替换未转义的正则表达式CSV文件中的非连续双引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆