如何使用csv模块处理字段值内的双引号? [英] How to handle double quotes inside field values with csv module?

查看:140
本文介绍了如何使用csv模块处理字段值内的双引号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试从不能控制的外部系统解析CSV文件。

I'm trying to parse CSV files from an external system which I have no control of.


  • 逗号用作分隔符

  • 当单元格包含逗号时,它包装在引号中,所有其他引号都用另一个引号字符转义。


示例CSV:


qwerty,abcd,efg

qw""erty,"a""b""c""d,ef""""g"

应解析为:

[['qw"erty', 'a"b"c"d,ef""g']]

但是,我认为Python的csv模块当单元格首先未包含在引号字符中时,不会期望转义字符。
csv.reader(my_file)(默认值 doublequote = True )返回:

However, I think that Python's csv module does not expect quote characters to be escaped when cell was not wrapped in quote chars in the first place. csv.reader(my_file) (with default doublequote=True) returns:

['qw""erty', 'a"b"c"d,ef""g']

有没有办法用python csv模块解析这个?

Is there any way to parse this with python csv module ?

推荐答案

在@JackManey注释后,他建议用'替换双引号中的'' \\'

Following on @JackManey comment where he suggested to replace all instances of '""' inside of double quotes with '\\"'.

识别我们当前是否在双引号单元格内部是不必要的,我们可以替换所有实例'''\\'
Python文档说

Recognizing if we are currently inside of double quoted cells turned out to be unnecessary and we can replace all instances of '""' with '\\"'. Python documentation says:

阅读时,escapechar从以下字符中删除任何特殊含义

On reading, the escapechar removes any special meaning from the following character

在原始单元格已经包含转义字符的情况下,例如:'qw \\\\erty生成 [[ qw\\erty]] 。因此,我们必须在解析之前转义转义字符。

However this would still break in the case where original cell already contains escape characters, example: 'qw\\\\""erty' producing [['qw\\"erty']]. So we have to escape the escape characters before parsing too.

最终解决方案:

with open(file_path, 'rb') as f:
  content = f.read().replace('\\', '\\\\').replace('""', '\\"')
  reader = csv.reader(StringIO(content), doublequote=False, escapechar='\\')
  return [row for row in reader]

这篇关于如何使用csv模块处理字段值内的双引号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆