匹配带引号的csv中未转义的引号 [英] Match unescaped quotes in quoted csv

查看:71
本文介绍了匹配带引号的csv中未转义的引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看过几个标题相似的Stack Overflow帖子,但没有一个被接受的答案对我有用.

I've looked at several of the Stack Overflow posts with similar titles, and none of the accepted answers have done the trick for me.

我有一个CSV文件,其中每个单元格"数据均以逗号分隔并加引号(包括数字).每行以换行符结尾.

I have a CSV file where each "cell" of data is delimited by a comma and is quoted (including numbers). Each line ends with a new line character.

某些文本单元格"中带有引号,我想使用正则表达式来查找它们,以便我可以正确地对它们进行转义.

Some text "cells" have quotation marks in them, and I want to use regex to find these, so that I can escape them properly.

示例行:

"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60"","AD"8"\n

我想匹配匹配 E 60" AD" 8 中的" ,但不匹配其他任何".

我可以用来执行此操作的(最好是Python友好的)正则表达式是什么?

What is a (preferably Python-friendly) regular expression that I can use to do this?

推荐答案

使用@sundance中的正则表达式进行了更新,以避免行和换行符的开头.

Updated with regex from @sundance to avoid beginning of line and newline.

您可以尝试仅替换逗号,行首或换行符旁边的引号:

You could try substituting only quotes that aren't next to a comma, start of line, or newline:

import re

newline = re.sub(r'(?<!^)(?<!,)"(?!,|$)', '', line)

这篇关于匹配带引号的csv中未转义的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆