Python DictWriter编写UTF-8编码的CSV文件 [英] Python DictWriter writing UTF-8 encoded CSV files

查看:1402
本文介绍了Python DictWriter编写UTF-8编码的CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  • 我有一个包含unicode字符串的字典列表。

  • csv.DictWriter可以将一个字典列表写入CSV文件。

  • 我希望CSV文件以UTF8编码。

  • csv模块无法处理将Unicode字符串转换为UTF8。

  • csv模块文档有一个将所有内容转换为UTF8的示例:

  •   def utf_8_encoder(unicode_csv_data):
    for unicode_csv_data:
    yield line.encode('utf-8')

    它还有一个 class UnicodeWriter:



    但是...如何让DictWriter使用这些?他们不是必须在它的中间注入自己,捕获反汇编的字典,并在它们写入文件之前对它们进行编码?

    $ p

    解决方案

    如果使用Python 2.7或更高版本,请使用dict解析将字典重新映射到utf- 8之前传递给DictWriter:

     #coding:utf-8 
    import csv
    D = {' name':u'马克','pinyin':u'mǎkè'}
    f = open('out.csv','wb')
    f.write(u'\\\'.encode ('utf8'))#BOM(可选... Excel需要它正确打开UTF-8文件)
    w = csv.DictWriter(f,sorted(D.keys()))
    w。 writeheader()
    w.writerow({k:v.encode('utf8')for k,v in D.items()})
    f.close()

    您可以使用此想法将UnicodeWriter更新为DictUnicodeWriter:

     #coding:utf-8 
    import csv
    import cStringIO
    import codecs

    class DictUnicodeWriter(object):

    def __init __(self,f,fieldnames,dialect = csv.excel,encoding =utf-8,** kwds):
    #重定向输出到队列
    self.queue = cStringIO .StringIO()
    self.writer = csv.DictWriter(self.queue,fieldnames,dialect = dialect,** kwds)
    self.stream = f $ b $ self.encoder = codecs.getincrementalencoder (编码)()

    def writerow(self,D):
    self.writer.writerow({k:v.encode(utf-8)for k,v in D 。
    #从队列中读取UTF-8输出...
    data = self.queue.getvalue()
    data = data.decode(utf-8 )
    #...并将其重新编码为目标编码
    data = self.encoder.encode(data)
    #写入目标流
    self.stream.write数据)
    #空队列
    self.queue.truncate(0)

    def writerows(self,rows):
    for D in rows:
    self.writerow(D)

    def writeheader(self):
    self.writer.writeheader()

    D1 = {'name':u' ,'pinyin':u'Mǎkè'}
    D2 = {'name':u'美国','pinyin':u'Měiguó'}
    f = open('out.csv','wb ')
    f.write(u'\\\'.encode('utf8'))#BOM(可选... Excel需要它正确打开UTF-8文件)
    w = DictUnicodeWriter ,sorted(D.keys()))
    w.writeheader()
    w.writerows([D1,D2])
    f.close()


    1. I have a list of dictionaries containing unicode strings.
    2. csv.DictWriter can write a list of dictionaries into a CSV file.
    3. I want the CSV file to be encoded in UTF8.
    4. The csv module cannot handle converting unicode strings into UTF8.
    5. The csv module documentation has an example for converting everything to UTF8:

    :

    def utf_8_encoder(unicode_csv_data):
        for line in unicode_csv_data:
            yield line.encode('utf-8')
    

    It also has a class UnicodeWriter:.

    But... how do I make DictWriter work with these? Wouldn't they have to inject themselves in the middle of it, to catch the disassembled dictionaries and encode them before it writes them to the file? I don't get it.

    解决方案

    If using Python 2.7 or later, use a dict comprehension to remap the dictionary to utf-8 before passing to DictWriter:

    # coding: utf-8
    import csv
    D = {'name':u'马克','pinyin':u'mǎkè'}
    f = open('out.csv','wb')
    f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly)
    w = csv.DictWriter(f,sorted(D.keys()))
    w.writeheader()
    w.writerow({k:v.encode('utf8') for k,v in D.items()})
    f.close()
    

    You can use this idea to update UnicodeWriter to DictUnicodeWriter:

    # coding: utf-8
    import csv
    import cStringIO
    import codecs
    
    class DictUnicodeWriter(object):
    
        def __init__(self, f, fieldnames, dialect=csv.excel, encoding="utf-8", **kwds):
            # Redirect output to a queue
            self.queue = cStringIO.StringIO()
            self.writer = csv.DictWriter(self.queue, fieldnames, dialect=dialect, **kwds)
            self.stream = f
            self.encoder = codecs.getincrementalencoder(encoding)()
    
        def writerow(self, D):
            self.writer.writerow({k:v.encode("utf-8") for k,v in D.items()})
            # Fetch UTF-8 output from the queue ...
            data = self.queue.getvalue()
            data = data.decode("utf-8")
            # ... and reencode it into the target encoding
            data = self.encoder.encode(data)
            # write to the target stream
            self.stream.write(data)
            # empty queue
            self.queue.truncate(0)
    
        def writerows(self, rows):
            for D in rows:
                self.writerow(D)
    
        def writeheader(self):
            self.writer.writeheader()
    
    D1 = {'name':u'马克','pinyin':u'Mǎkè'}
    D2 = {'name':u'美国','pinyin':u'Měiguó'}
    f = open('out.csv','wb')
    f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly)
    w = DictUnicodeWriter(f,sorted(D.keys()))
    w.writeheader()
    w.writerows([D1,D2])
    f.close()
    

    这篇关于Python DictWriter编写UTF-8编码的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆