尝试修复无效的CSV文件 [英] Trying to fix Invalid CSV File
问题描述
我有一个非常大的CSV文件,其中包含双引号字段(因为
它们包含逗号)。不幸的是,其中一些字段还包含
其他双引号,我犯了一个痛苦的错误:忘记了
转义或者字段内的引号加倍:
123,这是一些,文字和一些引用的文字引号应该是
加倍,321
有没有人以前处理过这个问题?算法I
的任何想法都可以用于Python脚本来创建一个新的,修复过的CSV文件?
TIA,
Ryan
I have a very large CSV file that contains double quoted fields (since
they contain commas). Unfortunately, some of these fields also contain
other double quotes and I made the painful mistake of forgetting to
escape or double the quotes inside the field:
123,"Here is some, text "and some quoted text" where the quotes should
have been doubled",321
Has anyone dealt with this problem before? Any ideas of an algorithm I
can use for a Python script to create a new, repaired CSV file?
TIA,
Ryan
推荐答案
Ryan Rosario写道:
Ryan Rosario wrote:
我有一个非常大的CSV文件包含双引号字段(因为
它们包含逗号)。不幸的是,其中一些字段还包含
其他双引号,我犯了一个痛苦的错误:忘记了
转义或者字段内的引号加倍:
123,这是一些,文字和一些引用的文字其中引号应该是
加倍,321
I have a very large CSV file that contains double quoted fields (since
they contain commas). Unfortunately, some of these fields also contain
other double quotes and I made the painful mistake of forgetting to
escape or double the quotes inside the field:
123,"Here is some, text "and some quoted text" where the quotes should
have been doubled",321
rec =''''''123,这是一些,文字和一些引用的文字其中引号
应加倍,321''''
导入csv
csv.reader([rec.replace('',''','','""'''')
.replace(''",'',''' """,'')
.replace(''"""'',''''''''')
.replace('''''','''"''')
.replace("'''''''','''''''')] ).next()
[''123'',''这是一些,文字'和一些引用的文字其中引号
应该加倍'','321'']
:))
Emile
rec = ''''''123,"Here is some, text "and some quoted text" where the quotes
should have been doubled",321''''''
import csv
csv.reader([rec.replace('',"'','',"""'')
.replace(''",'',''""",'')
.replace(''"""'',"''''''")
.replace(''"'',''""'')
.replace("''''''",''"'')]).next()
[''123'', ''Here is some, text "and some quoted text" where the quotes
should have been doubled'', ''321'']
:))
Emile
之前有没有人处理过这个问题?算法I
的任何想法都可以用于Python脚本来创建一个新的,修复过的CSV文件?
TIA,
Ryan
-
http://mail.python.org/mailman/listinfo/python-list
8月3日,10:38 * pm, Emile van Sebille< em ... @ fenx.comwrote:
On Aug 3, 10:38*pm, Emile van Sebille <em...@fenx.comwrote:
Ryan Rosario写道:
Ryan Rosario wrote:
我有一个非常大的CSV文件,其中包含双引号字段(因为
它们包含逗号)。不幸的是,其中一些字段还包含
其他双引号,我犯了一个痛苦的错误:忘记
转义或者字段内的引号加倍:
I have a very large CSV file that contains double quoted fields (since
they contain commas). Unfortunately, some of these fields also contain
other double quotes and I made the painful mistake of forgetting to
escape or double the quotes inside the field:
123,这里有一些,文字和一些引用的文字引号应该是
加倍,321
123,"Here is some, text "and some quoted text" where the quotes should
have been doubled",321
rec =''''''123,这是一些,文字和一些引用的文字其中引号
应加倍,321''''
导入csv
csv.reader([rec.replace('',''','','""'''')
* * * * * * * * .replace('' ",'',''""",'')
* * * * * * * * .replace(''"""''," '''''''')
* * * * * * * * .replace(''"'',''""'')
* * * * * * * * .replace("''''''','''"'')])。next()
[ ''123'',''这是一些,文字'和一些引用的文字其中引号
应该加倍'','321'']
:))
Emile
rec = ''''''123,"Here is some, text "and some quoted text" where the quotes
should have been doubled",321''''''
import csv
csv.reader([rec.replace('',"'','',"""'')
* * * * * * * * .replace(''",'',''""",'')
* * * * * * * * .replace(''"""'',"''''''")
* * * * * * * * .replace(''"'',''""'')
* * * * * * * * .replace("''''''",''"'')]).next()
[''123'', ''Here is some, text "and some quoted text" where the quotes
should have been doubled'', ''321'']
:))
Emile
之前有没有人处理过这个问题?算法I
的任何想法都可以用于Python脚本来创建一个新的,修复过的CSV文件?
Has anyone dealt with this problem before? Any ideas of an algorithm I
can use for a Python script to create a new, repaired CSV file?
TIA,
Ryan
-
http://mail.python.org/mailman/listinfo/python-list
谢谢Emile!工作几乎完美,但有什么方法我可以
适应这个引用包含逗号的字段吗?
TIA,
Ryan
Thanks Emile! Works almost perfectly, but is there some way I can
adapt this to quote fields that contain a comma in them?
TIA,
Ryan
8月4日下午5:49,Ryan Rosario< uclamath ... @ gmail.comwrote:
On Aug 4, 5:49 pm, Ryan Rosario <uclamath...@gmail.comwrote:
>
谢谢Emile!几乎完美的工作,但是有一些方法我可以
适应这个引用包含逗号的字段吗?
>
Thanks Emile! Works almost perfectly, but is there some way I can
adapt this to quote fields that contain a comma in them?
你原来说的我有一个非常大的CSV文件,其中包含两个
引用字段(因为它们包含逗号)。你现在说那个
如果一个字段包含一个逗号,你是不是用引号括起来?或者
这是一个与原始问题无关的单独问题吗?
You originally said "I have a very large CSV file that contains double
quoted fields (since they contain commas)". Are you now saying that
if a field contained a comma, you didn''t wrap the field in quotes? Or
is this a separate question unrelated to your original problem?
这篇关于尝试修复无效的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!