在Python中处理UTF-8数字 [英] Dealing with UTF-8 numbers in Python

查看：137 发布时间：2016/11/19 14:50:21 python utf-8 character-encoding byte-order-mark

本文介绍了在Python中处理UTF-8数字的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我正在读一个包含3个逗号分隔的数字的文件。该文件与未知编码保存，到目前为止我处理的ANSI和UTF-8。如果文件是UTF-8，它有1行的值为115,113,12，则：

Suppose I am reading a file containing 3 comma separated numbers. The file was saved with with an unknown encoding, so far I am dealing with ANSI and UTF-8. If the file was in UTF-8 and it had 1 row with values 115,113,12 then:

with open(file) as f:
    a,b,c=map(int,f.readline().split(','))

会抛出：

invalid literal for int() with base 10: '\xef\xbb\xbf115'

第一个数字总是与这些'\xef\xbb \xbf'个字符。对于剩下的2个数字，转换工作正常。如果我用''手动替换'\xef\xbb\xbf'，然后做int转换，它会工作。

The first number is always mangled with these '\xef\xbb\xbf' characters. For the rest 2 numbers the conversion works fine. If I manually replace '\xef\xbb\xbf' with '' and then do the int conversion it will work.

有更好的方法这是为任何类型的编码文件？

Is there a better way of doing this for any type of encoded file?

推荐答案

import codecs

with codecs.open(file, "r", "utf-8-sig") as f:
    a, b, c= map(int, f.readline().split(","))

这适用于Python 2.6.4。 codecs.open 调用打开文件并以unicode返回数据，从UTF-8解码并忽略初始BOM。

This works in Python 2.6.4. The codecs.open call opens the file and returns data as unicode, decoding from UTF-8 and ignoring the initial BOM.

这篇关于在Python中处理UTF-8数字的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Python中处理UTF-8数字 [英] Dealing with UTF-8 numbers in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Python中处理UTF-8数字 [英] Dealing with UTF-8 numbers in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭