python中的unicodecsv阅读器出现问题 [英] trouble with unicodecsv reader in python

查看:118
本文介绍了python中的unicodecsv阅读器出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法使用unicodecsv阅读器.我一直在寻找有关如何使用该模块的不同示例,但是每个人都在不断引用unicodecsv网站(或一些类似的变体)中的确切示例.

I'm having trouble using the unicodecsv reader. I keep looking for different examples of how to use the module, but everyone keeps referencing the exact sample from the unicodecsv website (or some similar variation).

import unicodecsv as csv
from io import BytesIO
f = BytesIO()
w = csv.writer(f, encoding='utf-8')
_ = w.writerow((u'é', u'ñ'))
_ = f.seek(0)
r = csv.reader(f, encoding='utf-8')
next(r) == [u'é', u'ñ']
>>> True

对我来说,这个例子对我们的理解做出了太多假设.看起来好像没有传递csv文件.我完全错过了情节.

For me this example makes too many assumptions about our understanding. It doesn't look like a csv file is being passed. I've completely missed the plot.

我想做的是:

  1. 读取csv文件的第一行是标题
  2. 阅读剩余的几行并将其放入字典中

我的破损代码:

import unicodecsv
#
i = 0
myCSV = "$_input.csv"
dic = {}
#
f = open(myCSV, "rb")
reader = unicodecsv.reader(f, delimiter=',')
strHeader = reader.next()
#
# read the first line of csv
# use custom function to parse the header
myHeader = FNC.PARSE_HEADER(strHeader)
#
# read the remaining lines
# put data into dictionary of class objects
for row in reader:
    i += 1
    dic[i] = cDATA(myHeader, row)

而且,正如预期的那样,我得到了"UnicodeDecodeError".也许上面的例子有答案,但是它们完全让我感到困惑.

And, as expected, I get the 'UnicodeDecodeError'. Maybe the example above has the answers, but they are just completely going over my head.

有人可以修复我的代码吗?我的头发快要拉出来了.

Can someone please fix my code? I'm running out of hair to pull out.

我将阅读器行切换为:

reader = unicodecsv.reader(f, encoding='utf-8')

跟踪: 对于阅读器中的行: 文件"C:\ Python27 \ unicodecsv \ py2.py",下一步中的第128行 对于行中的值]

Traceback: for row in reader: File "C:\Python27\unicodecsv\py2.py", line 128 in next for value in row]

UnicodeDecodeError:'utf8'编解码器无法解码位置48:起始字节无效的字节0x90

UnicodeDecodeError: 'utf8' codec can't decode byte 0x90 in position 48: invalide start byte

当我严格使用以下命令打印数据时:

When I strictly print the data using:

f = open(myCSV, "rb")
reader = csv.reader(f, delimiter=',')
for row in reader:
    print(str[row[9]] + '\n')
    print(repr(row[9] + '\n')
>>> UTAS ? Offline
>>> 'UTAS ? Offline'

推荐答案

创建阅读器时,您需要声明输入文件的编码,就像创建书写器时一样:

You need to declare the encoding of the input file when creating the reader, just like you did when creating the writer:

>>> import unicodecsv as csv
>>> with open('example.csv', 'wb') as f:
...     writer = csv.writer(f, encoding='utf-8')
...     writer.writerow(('heading0', 'heading1'))
...     writer.writerow((u'é', u'ñ'))
...     writer.writerow((u'ŋ', u'ŧ'))
... 
>>> with open('example.csv', 'rb') as f:
...     reader = csv.reader(f, encoding='utf-8')
...     headers = next(reader)
...     print headers
...     data = {i: v for (i, v) in enumerate(reader)}
...     print data
... 
[u'heading0', u'heading1']
{0: [u'\xe9', u'\xf1'], 1: [u'\u014b', u'\u0167']}

打印字典显示了转义的数据表示,但是您可以通过单独打印来查看字符:

Printing the dictionary shows the escaped representation of the data, but you can see the characters by printing them individually:

>>> for v in data.values():
...     for s in v:
...         print s
... 
é
ñ
ŋ
ŧ

如果文件的编码未知,那么最好使用 chardet 之类的东西来确定处理之前的编码.

If the encoding of the file is unknown, then it's best to use some like chardet to determine the encoding before processing.

这篇关于python中的unicodecsv阅读器出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆