去除场引号csv文件 [英] Removing in-field quotes in csv file

查看:114
本文介绍了去除场引号csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们说我们有一个逗号分隔的文件(CSV)是这样的:

Let's say we have a comma separated file (csv) like this:

"name of movie","starring","director","release year"
"dark knight rises","christian bale, anna hathaway","christopher nolan","2012"
"the dark knight","christian bale, heath ledger","christopher nolan","2008"
"The "day" when earth stood still","Michael Rennie,the 'strong' man","robert wise","1951"
"the 'gladiator'","russel "the awesome" crowe","ridley scott","2000"

你可以从上面看到,在线路4安培; 5有引号内报价。
输出应该是这个样子:

As you can see from above, in lines 4 & 5 there is quotes within quotes. The output should look something like this:

"name of movie","starring","director","release year"
"dark knight rises","christian bale, anna hathaway","christopher nolan","2012"
"the dark knight","christian bale, heath ledger","christopher nolan","2008"
"The day when earth stood still","Michael Rennie,the strong man","robert wise","1951"
"the gladiator","russel the awesome crowe","ridley scott","2000"

如何摆脱对一个CSV文件这样的行情中发生的这种引号(单,双)的。注意一个字段内的逗号没关系,解析器标识,它的引号内,并将其作为一个字段。这仅仅是一个$ P排列的CSV文件,以便它可以被送入多个解析器转换成我们想要的任何格式的$ pprocessing一步。
击时,awk,Python中的所有的作品。请没有perl的,我讨厌那种语言的:D
在此先感谢!

How to get rid of such quotes (both single and double) that occur within quotes like this on a csv file. Note that comma within a single field is okay as the parser identifies that it's within quotes and takes it as one field. This is just a preprocessing step of arranging csv files so that it can be fed into multiple parsers to convert into any format we desire. Bash, awk, python all works. Please no perl, I'm sick of that language :D Thanks in advance!

推荐答案

如何

import csv

def remove_quotes(s):
    return ''.join(c for c in s if c not in ('"', "'"))

with open("fixquote.csv","rb") as infile, open("fixed.csv","wb") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile, quoting=csv.QUOTE_ALL)
    for line in reader:
        writer.writerow([remove_quotes(elem) for elem in line])

产生

~/coding$ cat fixed.csv 
"name of movie","starring","director","release year"
"dark knight rises","christian bale, anna hathaway","christopher nolan","2012"
"the dark knight","christian bale, heath ledger","christopher nolan","2008"
"The day when earth stood still","Michael Rennie,the strong man","robert wise","1951"
"the gladiator","russel the awesome crowe","ridley scott","2000"

顺便说一句,你可能要检查一些这些名字的拼写。

BTW, you might want to check the spelling of some of those names..

这篇关于去除场引号csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆