Python 2.6 中对 csv 文件的通用 Unicode/UTF-8 支持 [英] General Unicode/UTF-8 support for csv files in Python 2.6

查看:15
本文介绍了Python 2.6 中对 csv 文件的通用 Unicode/UTF-8 支持的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当涉及 UTF-8/Unicode 时,Python 中的 csv 模块无法正常工作.我在 Python 文档 和其他网页上发现了适用于特定情况,但您必须清楚地了解您正在处理的编码并使用适当的代码段.

The csv module in Python doesn't work properly when there's UTF-8/Unicode involved. I have found, in the Python documentation and on other webpages, snippets that work for specific cases but you have to understand well what encoding you are handling and use the appropriate snippet.

如何从在 Python 2.6 中正常工作"的 .csv 文件中读取和写入字符串和 Unicode 字符串?或者这是 Python 2.6 的限制,没有简单的解决方案?

How can I read and write both strings and Unicode strings from .csv files that "just works" in Python 2.6? Or is this a limitation of Python 2.6 that has no simple solution?

推荐答案

http://docs.python.org/library/csv.html#examples 看起来已经过时了,因为它不适用于 Python 2.6 和 2.7.

The example code of how to read Unicode given at http://docs.python.org/library/csv.html#examples looks to be obsolete, as it doesn't work with Python 2.6 and 2.7.

下面是 UnicodeDictReader,它适用于 utf-8,也可能适用于其他编码,但我只在 utf-8 输入上对其进行了测试.

Here follows UnicodeDictReader which works with utf-8 and may be with other encodings, but I only tested it on utf-8 inputs.

简而言之,这个想法是仅在 csv.reader 将 csv 行拆分为字段后才解码 Unicode.

The idea in short is to decode Unicode only after a csv row has been split into fields by csv.reader.

class UnicodeCsvReader(object):
    def __init__(self, f, encoding="utf-8", **kwargs):
        self.csv_reader = csv.reader(f, **kwargs)
        self.encoding = encoding

    def __iter__(self):
        return self

    def next(self):
        # read and split the csv row into fields
        row = self.csv_reader.next() 
        # now decode
        return [unicode(cell, self.encoding) for cell in row]

    @property
    def line_num(self):
        return self.csv_reader.line_num

class UnicodeDictReader(csv.DictReader):
    def __init__(self, f, encoding="utf-8", fieldnames=None, **kwds):
        csv.DictReader.__init__(self, f, fieldnames=fieldnames, **kwds)
        self.reader = UnicodeCsvReader(f, encoding=encoding, **kwds)

用法(源文件编码为utf-8):

Usage (source file encoding is utf-8):

csv_lines = (
    "абв,123",
    "где,456",
)

for row in UnicodeCsvReader(csv_lines):
    for col in row:
        print(type(col), col)

输出:

$ python test.py
<type 'unicode'> абв
<type 'unicode'> 123
<type 'unicode'> где
<type 'unicode'> 456

这篇关于Python 2.6 中对 csv 文件的通用 Unicode/UTF-8 支持的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆