Python:使用UnicodeWriter将Unicode写入CSV [英] Python: Write Unicode to CSV using UnicodeWriter
问题描述
Python文档具有以下代码示例,用于将unicode写入csv文件.我认为它已经提到了这种方法,因为csv模块无法处理unicode字符串.
Python Documents have following code example on writing unicode to csv file. I think it has mentioned there that this is the way to do since csv module can't handle unicode strings.
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
我正在编写多个文件,并且为了简单起见,我仅在代码部分中演示了如何在代码中使用以上类:
I am writing more than one file and to keep it simple I have only put the section of my code to demonstrate how I use above class in my code:
def write(self):
"""
Outputs the dataset to a csv.
"""
f = codecs.open(self.filename, 'a')
writer = UnicodeWriter(f)
#with open(self.filename, 'a', encoding='utf-8') as f:
if self.headers and not self.written:
writer.writerow(self.headers)
self.written = True
for record in self.records[self.last_written:]:
print record
writer.writerow(record)
self.last_written = len(self.records)
f.close()
这是类coll数据集中的一种方法,该方法在写入csv之前准备数据集,之前我使用的是writer = csv.writer(f)
,但是由于编解码器错误,我将代码更改为使用`UnicodeWriter类.
This is a method inside a class coll dataset which prepare the dataset prior to writing to csv, previously I was using writer = csv.writer(f)
but due to codec errors I change my code to use `UnicodeWriter class.
但是我的问题是,当我打开csv文件时,得到以下信息:
But my problem is that when I open the csv file, I get the following:
some_header
B,r,ë,k,ò,w,n,i,k,_,b,s
B,r,ë,k,ò,w,n,i,k,_,c,s
B,r,ë,k,ò,w,n,i,k,_,c,s,b
B,r,ë,k,ò,w,n,i,k,_,d,e
B,r,ë,k,ò,w,n,i,k,_,d,e,-,1
B,r,ë,k,ò,w,n,i,k,_,d,e,-,2
B,r,ë,k,ò,w,n,i,k,_,d,e,-,3
B,r,ë,k,ò,w,n,i,k,_,d,e,-,4
B,r,ë,k,ò,w,n,i,k,_,d,e,-,5
B,r,ë,k,ò,w,n,i,k,_,d,e,-,M
B,r,ë,k,ò,w,n,i,k,_,e,n
B,r,ë,k,ò,w,n,i,k,_,e,n,-,1
B,r,ë,k,ò,w,n,i,k,_,e,n,-,2
这些行实际上应该是类似Brëkòwnik_de-1
的地方,我并不是真的在发生什么.
Where as these rows should actually should be something like Brëkòwnik_de-1
I am not really whats happening.
要基本了解如何生成数据,我将添加以下行:
title = unicode(row_page_title['page_title'], 'utf-8')
To give a basic idea of how the data has been generated I would add the following line:
title = unicode(row_page_title['page_title'], 'utf-8')
推荐答案
此症状表示类似将字符串输入到需要列表或元组的函数/方法中.
This symptom points to something like feeding a string into a function/method that is expecting a list or tuple.
writerows
方法需要一个列表列表,而writerow
需要一个包含字段值的列表(或元组).由于您要给它提供一个字符串,并且字符串可以在迭代时模拟一个字符列表,因此您将获得一个CSV,每列包含一个字符.
The writerows
method is expecting a list of lists, and writerow
expects a list (or tuple) containing the field values. Since you are feeding it a string, and a string can mimic a list of characters when you iterate over it, you get a CSV with one character in each column.
如果CSV仅包含一列,则应使用writer.writerow([data])
而不是writer.writerow(data)
.有些人可能会质疑,如果只有一列,您是否真的需要csv模块,但是csv模块将处理诸如包含有趣内容(CR/LF等)的记录之类的事情,所以是的,这是一个好主意.
If your CSV has just one column, you should use writer.writerow([data])
instead of writer.writerow(data)
. Some may question if you really need the csv module if you have only one column, but the csv module will handle things like a record containing funny stuff (CR/LF and others), so yes, it is a good idea.
这篇关于Python:使用UnicodeWriter将Unicode写入CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!