尝试修复无效的CSV文件 [英] Trying to fix Invalid CSV File

查看:340
本文介绍了尝试修复无效的CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的CSV文件,其中包含双引号字段(因为

它们包含逗号)。不幸的是,其中一些字段还包含

其他双引号,我犯了一个痛苦的错误:忘记了
转义或者字段内的引号加倍:


123,这是一些,文字和一些引用的文字引号应该是

加倍,321


有没有人以前处理过这个问题?算法I

的任何想法都可以用于Python脚本来创建一个新的,修复过的CSV文件?


TIA,

Ryan

I have a very large CSV file that contains double quoted fields (since
they contain commas). Unfortunately, some of these fields also contain
other double quotes and I made the painful mistake of forgetting to
escape or double the quotes inside the field:

123,"Here is some, text "and some quoted text" where the quotes should
have been doubled",321

Has anyone dealt with this problem before? Any ideas of an algorithm I
can use for a Python script to create a new, repaired CSV file?

TIA,
Ryan

推荐答案

Ryan Rosario写道:
Ryan Rosario wrote:

我有一个非常大的CSV文件包含双引号字段(因为

它们包含逗号)。不幸的是,其中一些字段还包含

其他双引号,我犯了一个痛苦的错误:忘记了
转义或者字段内的引号加倍:


123,这是一些,文字和一些引用的文字其中引号应该是
加倍,321
I have a very large CSV file that contains double quoted fields (since
they contain commas). Unfortunately, some of these fields also contain
other double quotes and I made the painful mistake of forgetting to
escape or double the quotes inside the field:

123,"Here is some, text "and some quoted text" where the quotes should
have been doubled",321



rec =''''''123,这是一些,文字和一些引用的文字其中引号

应加倍,321''''


导入csv


csv.reader([rec.replace('',''','','""'''')

.replace(''",'',''' """,'')

.replace(''"""'',''''''''')

.replace('''''','''"''')

.replace("'''''''','''''''')] ).next()


[''123'',''这是一些,文字'和一些引用的文字其中引号

应该加倍'','321'']


:))


Emile


rec = ''''''123,"Here is some, text "and some quoted text" where the quotes
should have been doubled",321''''''

import csv

csv.reader([rec.replace('',"'','',"""'')
.replace(''",'',''""",'')
.replace(''"""'',"''''''")
.replace(''"'',''""'')
.replace("''''''",''"'')]).next()

[''123'', ''Here is some, text "and some quoted text" where the quotes
should have been doubled'', ''321'']

:))

Emile


之前有没有人处理过这个问题?算法I

的任何想法都可以用于Python脚本来创建一个新的,修复过的CSV文件?


TIA,

Ryan

-
http://mail.python.org/mailman/listinfo/python-list


8月3日,10:38 * pm, Emile van Sebille< em ... @ fenx.comwrote:
On Aug 3, 10:38*pm, Emile van Sebille <em...@fenx.comwrote:

Ryan Rosario写道:
Ryan Rosario wrote:

我有一个非常大的CSV文件,其中包含双引号字段(因为

它们包含逗号)。不幸的是,其中一些字段还包含

其他双引号,我犯了一个痛苦的错误:忘记
转义或者字段内的引号加倍:
I have a very large CSV file that contains double quoted fields (since
they contain commas). Unfortunately, some of these fields also contain
other double quotes and I made the painful mistake of forgetting to
escape or double the quotes inside the field:


123,这里有一些,文字和一些引用的文字引号应该是

加倍,321
123,"Here is some, text "and some quoted text" where the quotes should
have been doubled",321



rec =''''''123,这是一些,文字和一些引用的文字其中引号

应加倍,321''''


导入csv


csv.reader([rec.replace('',''','','""'''')

* * * * * * * * .replace('' ",'',''""",'')

* * * * * * * * .replace(''"""''," '''''''')

* * * * * * * * .replace(''"'',''""'')

* * * * * * * * .replace("''''''','''"'')])。next()


[ ''123'',''这是一些,文字'和一些引用的文字其中引号

应该加倍'','321'']


:))


Emile


rec = ''''''123,"Here is some, text "and some quoted text" where the quotes
should have been doubled",321''''''

import csv

csv.reader([rec.replace('',"'','',"""'')
* * * * * * * * .replace(''",'',''""",'')
* * * * * * * * .replace(''"""'',"''''''")
* * * * * * * * .replace(''"'',''""'')
* * * * * * * * .replace("''''''",''"'')]).next()

[''123'', ''Here is some, text "and some quoted text" where the quotes
should have been doubled'', ''321'']

:))

Emile


之前有没有人处理过这个问题?算法I

的任何想法都可以用于Python脚本来创建一个新的,修复过的CSV文件?
Has anyone dealt with this problem before? Any ideas of an algorithm I
can use for a Python script to create a new, repaired CSV file?


TIA,

Ryan

-
http://mail.python.org/mailman/listinfo/python-list




谢谢Emile!工作几乎完美,但有什么方法我可以

适应这个引用包含逗号的字段吗?


TIA,

Ryan

Thanks Emile! Works almost perfectly, but is there some way I can
adapt this to quote fields that contain a comma in them?

TIA,
Ryan


8月4日下午5:49,Ryan Rosario< uclamath ... @ gmail.comwrote:
On Aug 4, 5:49 pm, Ryan Rosario <uclamath...@gmail.comwrote:

>

谢谢Emile!几乎完美的工作,但是有一些方法我可以

适应这个引用包含逗号的字段吗?
>
Thanks Emile! Works almost perfectly, but is there some way I can
adapt this to quote fields that contain a comma in them?



你原来说的我有一个非常大的CSV文件,其中包含两个

引用字段(因为它们包含逗号)。你现在说那个

如果一个字段包含一个逗号,你是不是用引号括起来?或者

这是一个与原始问题无关的单独问题吗?

You originally said "I have a very large CSV file that contains double
quoted fields (since they contain commas)". Are you now saying that
if a field contained a comma, you didn''t wrap the field in quotes? Or
is this a separate question unrelated to your original problem?


这篇关于尝试修复无效的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆