Python DictWriter 编写 UTF-8 编码的 CSV 文件 [英] Python DictWriter writing UTF-8 encoded CSV files

查看:41
本文介绍了Python DictWriter 编写 UTF-8 编码的 CSV 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  1. 我有一个包含 unicode 字符串的字典列表.
  2. csv.DictWriter 可以将字典列表写入 CSV 文件.
  3. 我希望 CSV 文件以 UTF8 编码.
  4. csv 模块无法处理将 unicode 字符串转换为 UTF8.
  5. csv 模块文档有一个将所有内容转换为 UTF8 的示例:

  1. I have a list of dictionaries containing unicode strings.
  2. csv.DictWriter can write a list of dictionaries into a CSV file.
  3. I want the CSV file to be encoded in UTF8.
  4. The csv module cannot handle converting unicode strings into UTF8.
  5. The csv module documentation has an example for converting everything to UTF8:

def utf_8_encoder(unicode_csv_data):
    for line in unicode_csv_data:
        yield line.encode('utf-8')

  • 它还有一个 UnicodeWriter 类.

    但是...我如何让 DictWriter 与这些一起工作?他们不是必须在其中注入自己,以捕获反汇编的字典并在将它们写入文件之前对其进行编码吗?我不明白.

    But... how do I make DictWriter work with these? Wouldn't they have to inject themselves in the middle of it, to catch the disassembled dictionaries and encode them before it writes them to the file? I don't get it.

    推荐答案

    UPDATE: The 3rd party unicodecsv 模块为您实现了这个已有 7 年历史的答案.此代码下方的示例.还有一个不需要第 3 方模块的 Python 3 解决方案.

    UPDATE: The 3rd party unicodecsv module implements this 7-year old answer for you. Example below this code. There's also a Python 3 solution that doesn't required a 3rd party module.

    原始 Python 2 答案

    如果使用 Python 2.7 或更高版本,在传递给 DictWriter 之前,使用 dict comprehension 将字典重新映射为 utf-8:

    If using Python 2.7 or later, use a dict comprehension to remap the dictionary to utf-8 before passing to DictWriter:

    # coding: utf-8
    import csv
    D = {'name':u'马克','pinyin':u'mǎkè'}
    f = open('out.csv','wb')
    f.write(u'ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly)
    w = csv.DictWriter(f,sorted(D.keys()))
    w.writeheader()
    w.writerow({k:v.encode('utf8') for k,v in D.items()})
    f.close()
    

    你可以用这个想法将 UnicodeWriter 更新为 DictUnicodeWriter:

    You can use this idea to update UnicodeWriter to DictUnicodeWriter:

    # coding: utf-8
    import csv
    import cStringIO
    import codecs
    
    class DictUnicodeWriter(object):
    
        def __init__(self, f, fieldnames, dialect=csv.excel, encoding="utf-8", **kwds):
            # Redirect output to a queue
            self.queue = cStringIO.StringIO()
            self.writer = csv.DictWriter(self.queue, fieldnames, dialect=dialect, **kwds)
            self.stream = f
            self.encoder = codecs.getincrementalencoder(encoding)()
    
        def writerow(self, D):
            self.writer.writerow({k:v.encode("utf-8") for k,v in D.items()})
            # Fetch UTF-8 output from the queue ...
            data = self.queue.getvalue()
            data = data.decode("utf-8")
            # ... and reencode it into the target encoding
            data = self.encoder.encode(data)
            # write to the target stream
            self.stream.write(data)
            # empty queue
            self.queue.truncate(0)
    
        def writerows(self, rows):
            for D in rows:
                self.writerow(D)
    
        def writeheader(self):
            self.writer.writeheader()
    
    D1 = {'name':u'马克','pinyin':u'Mǎkè'}
    D2 = {'name':u'美国','pinyin':u'Měiguó'}
    f = open('out.csv','wb')
    f.write(u'ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly)
    w = DictUnicodeWriter(f,sorted(D.keys()))
    w.writeheader()
    w.writerows([D1,D2])
    f.close()
    

    Python 2 unicodecsv 示例:

    Python 2 unicodecsv Example:

    # coding: utf-8
    import unicodecsv as csv
    
    D = {u'name':u'马克',u'pinyin':u'mǎkè'}
    
    with open('out.csv','wb') as f:
        w = csv.DictWriter(f,fieldnames=sorted(D.keys()),encoding='utf-8-sig')
        w.writeheader()
        w.writerow(D)
    

    Python 3:

    此外,Python 3 的内置 csv 模块原生支持 Unicode:

    Additionally, Python 3's built-in csv module supports Unicode natively:

    # coding: utf-8
    import csv
    
    D = {u'name':u'马克',u'pinyin':u'mǎkè'}
    
    # Use newline='' instead of 'wb' in Python 3.
    with open('out.csv','w',encoding='utf-8-sig',newline='') as f:
        w = csv.DictWriter(f,fieldnames=sorted(D.keys()))
        w.writeheader()
        w.writerow(D)
    

    这篇关于Python DictWriter 编写 UTF-8 编码的 CSV 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆