在IronPython中读取带编解码器的UTF-8文件 [英] Reading UTF-8 file with codecs in IronPython

查看:909
本文介绍了在IronPython中读取带编解码器的UTF-8文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个以UTF-8编码的.csv文件,其中包含拉丁和西里尔字符。

I have a .csv file encoded in UTF-8, which contains both latin and cyrillic symbols.

;F1;F2;abcdefg3;F200
;ABSOLUTE;NOMINAL;NOMINAL;NOMINAL
o1;1;USA;Новосибирск;1223

我试图在IronPython 2.7.1中执行下面的脚本:

I'm trying to execute following script in IronPython 2.7.1:

import codecs

f = codecs.open(r"file.csv", "rb", "utf-8")
f.next()

在执行f.next()期间会发生异常:

During the execution of f.next() an exception occurs:

Traceback (most recent call last):
  File "c:\Program Files\Microsoft Visual Studio 10.0\Common7\IDE\Extensions\Microsoft\Python Tools for Visual Studio\1.1\visualstudio_py_repl.py", line 492, in run_file_as_main
    code.Execute(self.exec_mod)
  File "<string>", line 4, in <module>
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 684, in next
    return self.reader.next()
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 615, in next
    line = self.readline()
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 530, in readline
    data = self.read(readsize, firstline=True)
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 477, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeEncodeError: ('unknown', '\x00', 0, 1, '')

同时在CPython 2.7脚本工作正常。也在IronPython 2.7.1下面的脚本工作正常:

At the same time in CPython 2.7 the script works correctly. Also in the IronPython 2.7.1 following script works fine:

import codecs

f = codecs.open(r"file.csv", "rb", "utf-8")
f.readlines()


b $ b

有人知道什么可能导致这种奇怪的行为吗?

Does anybody know what may cause such strange behaviour?

推荐答案

看起来可能是一个错误 next()处理编解码器。您可以使用要附加的文件进行问题吗?

Looks like it could be a bug in how next() handles codecs. Can you please open an issue with the files to reproduce attached?

这篇关于在IronPython中读取带编解码器的UTF-8文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆