为什么在Python的JSON编码中出现UnicodeDecodeError? [英] Why am I getting a UnicodeDecodeError in Python's JSON encoding?

查看:499
本文介绍了为什么在Python的JSON编码中出现UnicodeDecodeError?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Solr 3.3索引数据库中的内容.我用Python编写了JSON内容.我设法上传2126条记录,这些记录总计523246个字符(约511kb).但是当我尝试2027条记录时,Python给了我错误:

I am using Solr 3.3 to index stuff from my database. I compose the JSON content in Python. I manage to upload 2126 records which add up to 523246 chars (approx 511kb). But when I try 2027 records, Python gives me the error:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "D:\Technovia\db_indexer\solr_update.py", line 69, in upload_service_details
    request_string.append(param_list)
  File "C:\Python27\lib\json\__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "C:\Python27\lib\json\encoder.py", line 203, in encode
    chunks = list(chunks)
  File "C:\Python27\lib\json\encoder.py", line 425, in _iterencode
    for chunk in _iterencode_list(o, _current_indent_level):
  File "C:\Python27\lib\json\encoder.py", line 326, in _iterencode_list
    for chunk in chunks:
  File "C:\Python27\lib\json\encoder.py", line 384, in _iterencode_dict
    yield _encoder(value)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 68: invalid start byte 

太好了. 512kb的字节数是基本限制吗?现有的JSON模块有大量替代品吗?

Ouch. Is 512kb worth of bytes a fundamental limit? Is there any high-volume alternative to the existing JSON module?

更新:这是某些数据的错误,因为尝试对* biz_list [2126:] *进行编码会导致立即错误.这是令人不快的片段:

Update: its a fault of some data as trying to encode *biz_list[2126:]* causes an immediate error. Here is the offending piece:

'\ k'Kaloor,Kadavanthra Road,\ nCochin \ x96 682017'Gurumadhavendra Towers 2楼'

'2nd Floor, Gurumadhavendra Towers,\nKadavanthra Road, Kaloor,\nCochin \x96 682 017'

如何配置它以便可以将其编码为JSON?

How can I configure it so that it can be encodable into JSON?

更新2 :答案按预期进行:数据来自以"latin-1-swedish-ci"编码的MySQL表.我看到了一个随机数的意义.很抱歉,在诊断故障时会自发地传达标题作家的精神.

Update 2: The answer worked as expected: the data came from a MySQL table encoded in "latin-1-swedish-ci". I saw significance in a random number. Sorry for spontaneously channeling the spirit of a headline writer when diagnosing the fault.

推荐答案

简单,如果您的数据不在utf-8中,请不要使用utf-8编码

Simple, just don't use utf-8 encoding if your data is not in utf-8

>>> json.loads('["\x96"]')
....
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 0: invalid start byte

>>> json.loads('["\x96"]', encoding="latin-1")
[u'\x96']

json.loads

如果sstr实例,并使用基于ASCII的编码 编码为utf-8以外的格式(例如latin-1),然后再进行适当的编码 必须指定encoding名称.非ASCII的编码 不允许(例如UCS-2),并且应将其解码为 unicode首先.

If s is a str instance and is encoded with an ASCII based encoding other than utf-8 (e.g. latin-1) then an appropriate encoding name must be specified. Encodings that are not ASCII based (such as UCS-2) are not allowed and should be decoded to unicode first.

编辑:如Eli Collins所述,要获取正确的Unicode值"\ x96",请使用"cp1252"

Edit: To get proper unicode value of "\x96" use "cp1252" as Eli Collins mentioned

>>> json.loads('["\x96"]', encoding="cp1252")
[u'\u2013']

这篇关于为什么在Python的JSON编码中出现UnicodeDecodeError?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆