python 3.0 open()默认编码 [英] python 3.0 open() default encoding
问题描述
我正在尝试计算JSON文件中的行数. 点击此处访问我的JSON文件.
I am trying to count the lines in a JSON file. Click HERE to access my JSON file .
我试图用下面的代码计算行数.
I tried to use the below code to count the lines.
input = open("json/world_bank.json")
i=0
for l in input:
i+=1
print(i)
但是上面的代码抛出了UniCodeDecode错误,如下所示.
But the above code is throwing a UniCodeDecode Error as shown below.
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-17-edc88ade7225> in <module>()
2
3 i=0
----> 4 for l in input:
5 i+=1
6
C:\Users\Subbi Reddy\AppData\Local\Continuum\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3979: character maps to <undefined>
然后我在open函数中包括了编码参数,如下所示.
Then i included encoding parameter in open function as shown below.
input = open("json/world_bank.json",encoding="utf8")
然后它开始工作并输出500.
Then it started working and giving output as 500.
据我所知,python open应该考虑将"utf8"作为默认编码.
As far as i know python open should consider "utf8" as default encoding.
我在这里哪里出错了.
推荐答案
Python 3的默认UTF-8编码仅扩展到byte-> str转换. open()
而是使用您的环境来选择适当的编码:
The default UTF-8 encoding of Python 3 only extends to byte->str conversions. open()
instead uses your environment to choose an appropriate encoding:
从Python 3 docs 中获取open()
:>
From the Python 3 docs for open()
:
encoding
是用于对文件进行解码或编码的编码的名称.仅应在文本模式下使用.默认编码取决于平台(无论locale.getpreferredencoding()返回什么),但是可以使用Python支持的任何文本编码.有关支持的编码列表,请参见编解码器模块.
encoding
is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.
在您的情况下,就像您在使用西欧/北美的Windows上一样,系统会为您提供8位Windows-1252字符集.将encoding
设置为utf-8
会对此进行覆盖.
In your case, as you're on Windows with a Western Europe/North America, you will be given the 8bit Windows-1252 character set. Setting encoding
to utf-8
overrides this.
这篇关于python 3.0 open()默认编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!