unicodecsv阅读器从unicode字符串不工作? [英] unicodecsv reader from unicode string not working?

查看:465
本文介绍了unicodecsv阅读器从unicode字符串不工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在将unicode CSV字符串读入python-unicodescv时遇到问题:

I'm having trouble reading in a unicode CSV string into python-unicodescv:

>>> import unicodecsv, StringIO
>>> f = StringIO.StringIO(u'é,é')
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/guy/test/.env/lib/python2.7/site-packages/unicodecsv/__init__.py", line 101, in next
    row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

我猜这是一个问题,如何我的unicode字符串转换成一个StringIO文件不知何故?在python-unicodecsv github页面上的示例工作正常:

I'm guessing it's an issue with how I convert my unicode string into a StringIO file somehow? The example on the python-unicodecsv github page works fine:

>>> import unicodecsv
>>> from cStringIO import StringIO
>>> f = StringIO()
>>> w = unicodecsv.writer(f, encoding='utf-8')
>>> w.writerow((u'é', u'ñ'))
>>> f.seek(0)
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
>>> print row[0], row[1]
é ñ



<失败,因为cStringIO不能接受unicode(所以为什么示例工作,我不知道!)

Trying my code with cStringIO fails as cStringIO can't accept unicode (so why the example works, I don't know!)

>>> from cStringIO import StringIO
>>> f = StringIO(u'é')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

我需要从web textarea表单字段接受UTF-8 CSV格式的输入,因此不能从文件中读入。

I'm need to accept a UTF-8 CSV formatted input from a web textarea form field, hence can't just read in from a file.

任何想法?

推荐答案

unicodecsv 文件读取和解码字节字符串。你传递 unicode 字符串。在输出时,您的unicode值将使用配置的编解码器编码为bytestrings。

The unicodecsv file reads and decodes byte strings for you. You are passing it unicode strings instead. On output, your unicode values are encoded to bytestrings for you, using the configured codec.

此外, cStringIO.StringIO 只能处理编码的bytestrings,而pure-python StringIO.StringIO 类会乐于处理 unicode 是字节字符串。

In addition, cStringIO.StringIO can only handle encoded bytestrings, while the pure-python StringIO.StringIO class happily treats unicode values as if they are byte strings.

解决方案是在将放入 StringIO object:

The solution is to encode your unicode values before putting them into the StringIO object:

>>> import unicodecsv, StringIO, cStringIO
>>> f = StringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']
>>> f = cStringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']

这篇关于unicodecsv阅读器从unicode字符串不工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆