用于查找和替换未转义的正则表达式CSV文件中的非连续双引号 [英] Regular expression to find and replace unescaped Non-successive double quotes in CSV file
问题描述
这是对相关问题的扩展回答
这里
我有一个每周的csv文件,需要解析。它看起来像这样。
asdf,asdf,asdf,asdf
p>
但有时,文本字段包含额外的非转义双引号字符串,如下所示
asdf,assomethingdf,asdf,asdf
能够组合一个正则表达式
(?m)(?![\t] *(,| $) )
它匹配两个连续的双引号,只有如果他们没有逗号或end-在它们之前有可选的空格和制表符,位于
之间,现在只能连续找到双引号。如何修改它以查找和替换/删除文件中something周围的双引号?
谢谢。
(?<!^ |,)(?!,| $)
将匹配不在逗号之前或之后的双引号,也不位于行的开始/结尾。
如果你需要在逗号周围或在开始/结束处允许空格,如果你的正则表达式(你没有指定)允许任意长度的lookbehind(.NET does,for示例),您可以使用
(?<!^ \s * |,\s *) !\s *,| \s * $)
This is an extension to a related question answered Here
I have a weekly csv file which needs to be parsed. it looks like this.
"asdf","asdf","asdf","asdf"
But sometimes there are text fields which contain an extra unescaped double quote string like this
"asdf","as "something" df","asdf","asdf"
From the other posts on here, I was able to put together a regex
(?m)""(?![ \t]*(,|$))
which matches two successive double quotes, only "if they DON'T have a comma or end-of-the-line ahead of them with optionally spaces and tabs in between"
now this finds only double quotes in succession. How do i modify it to find and replace/delete the double quotes around "something" in the file?
thanks.
(?<!^|,)"(?!,|$)
will match a double quote that is not preceded or followed by a comma nor situated at start/end of line.
If you need to allow whitespace around the commas or at start/end-of-line, and if your regex flavor (which you didn't specify) allows arbitrary-length lookbehind (.NET does, for example), you can use
(?<!^\s*|,\s*)"(?!\s*,|\s*$)
这篇关于用于查找和替换未转义的正则表达式CSV文件中的非连续双引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!