通用Unicode / UTF-8支持在Python 2.6中的csv文件 [英] General Unicode/UTF-8 support for csv files in Python 2.6

查看:194
本文介绍了通用Unicode / UTF-8支持在Python 2.6中的csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Python中的csv模块在涉及UTF-8 / Unicode时无法正常工作。我发现,在 Python文档和其他网页上,适用于特定

The csv module in Python doesn't work properly when there's UTF-8/Unicode involved. I have found, in the Python documentation and on other webpages, snippets that work for specific cases but you have to understand well what encoding you are handling and use the appropriate snippet.

如何读取和写入.csv文件中的字符串和Unicode字符串,只是工作在Python 2.6?

How can I read and write both strings and Unicode strings from .csv files that "just works" in Python 2.6? Or is this a limitation of Python 2.6 that has no simple solution?

推荐答案

如何读取Unicode的示例代码在< a href =http://docs.python.org/library/csv.html#examples =noreferrer> http://docs.python.org/library/csv.html#examples 外观因为它不能与Python 2.6和2.7一起使用。

The example code of how to read Unicode given at http://docs.python.org/library/csv.html#examples looks to be obsolete, as it doesn't work with Python 2.6 and 2.7.

以下是 UnicodeDictReader utf-8和可能与其他编码,但我只测试它在utf-8输入。

Here follows UnicodeDictReader which works with utf-8 and may be with other encodings, but I only tested it on utf-8 inputs.

简单的想法是解码Unicode只有在csv行后已分割为 csv.reader

The idea in short is to decode Unicode only after a csv row has been split into fields by csv.reader.

class UnicodeCsvReader(object):
    def __init__(self, f, encoding="utf-8", **kwargs):
        self.csv_reader = csv.reader(f, **kwargs)
        self.encoding = encoding

    def __iter__(self):
        return self

    def next(self):
        # read and split the csv row into fields
        row = self.csv_reader.next() 
        # now decode
        return [unicode(cell, self.encoding) for cell in row]

    @property
    def line_num(self):
        return self.csv_reader.line_num

class UnicodeDictReader(csv.DictReader):
    def __init__(self, f, encoding="utf-8", fieldnames=None, **kwds):
        csv.DictReader.__init__(self, f, fieldnames=fieldnames, **kwds)
        self.reader = UnicodeCsvReader(f, encoding=encoding, **kwds)

用法(源文件编码为utf-8):

Usage (source file encoding is utf-8):

csv_lines = (
    "абв,123",
    "где,456",
)

for row in UnicodeCsvReader(csv_lines):
    for col in row:
        print(type(col), col)

输出:

$ python test.py
<type 'unicode'> абв
<type 'unicode'> 123
<type 'unicode'> где
<type 'unicode'> 456

这篇关于通用Unicode / UTF-8支持在Python 2.6中的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆