匹配带引号的csv中未转义的引号 [英] Match unescaped quotes in quoted csv
问题描述
我看过几个标题相似的Stack Overflow帖子,但没有一个被接受的答案对我有用.
I've looked at several of the Stack Overflow posts with similar titles, and none of the accepted answers have done the trick for me.
我有一个CSV文件,其中每个单元格"数据均以逗号分隔并加引号(包括数字).每行以换行符结尾.
I have a CSV file where each "cell" of data is delimited by a comma and is quoted (including numbers). Each line ends with a new line character.
某些文本单元格"中带有引号,我想使用正则表达式来查找它们,以便我可以正确地对它们进行转义.
Some text "cells" have quotation marks in them, and I want to use regex to find these, so that I can escape them properly.
示例行:
"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60"","AD"8"\n
我想匹配匹配 E 60"
和 AD" 8
中的"
,但不匹配其他任何"
.
我可以用来执行此操作的(最好是Python友好的)正则表达式是什么?
What is a (preferably Python-friendly) regular expression that I can use to do this?
推荐答案
使用@sundance中的正则表达式进行了更新,以避免行和换行符的开头.
Updated with regex from @sundance to avoid beginning of line and newline.
您可以尝试仅替换逗号,行首或换行符旁边的引号:
You could try substituting only quotes that aren't next to a comma, start of line, or newline:
import re
newline = re.sub(r'(?<!^)(?<!,)"(?!,|$)', '', line)
这篇关于匹配带引号的csv中未转义的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!