在PYTHON中读取EXCEL时'utf-16-le'编解码器无法解码字节 [英] 'utf-16-le' codec can't decode bytes while reading EXCEL in PYTHON

查看:572
本文介绍了在PYTHON中读取EXCEL时'utf-16-le'编解码器无法解码字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用不同的语言(阿拉伯语,希腊语,意大利语,希伯来语等)读取各种数量的xls文件,当我尝试调用open_workbook函数时,出现以下错误,我不知道该如何设置格式可以使用任何语言吗?

I am trying to read various numbers of xls files with different languages, Arabic, Greek, Italian, Hebrew, etc. and I get the error shown below when I try to call open_workbook function, any idea how can I set the format to any language?

代码:

book = xlrd.open_workbook(workbook_url)

错误:

返回codecs.utf_16_le_decode(input,errors,True)UnicodeDecodeError:"utf-16-le"编解码器无法解码位置中的字节372-373:数据意外结束

return codecs.utf_16_le_decode(input, errors, True) UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 372-373: unexpected end of data

推荐答案

语言不太可能成为问题.xlrd更可能无法检测到.xlsx文件的编码.

It's unlikely that language is the issue. More likely is that xlrd is having trouble detecting the encoding of the .xlsx file.

xlrd在有关处理unicode的文档中指出:

As xlrd notes in the documentation on handling of unicode:

此软件包将所有文本字符串显示为Python unicode对象.从Excel 97开始,Excel电子表格中的文本已存储为UTF-16LE(16位Unicode转换格式).较旧的文件(Excel 95及更早版本)不使用Unicode保留字符串;CODEPAGE记录提供了一个代码页编号(例如1252),xlrd使用该代码页编号来导出用于转换为Unicode的编码(对于同一示例:"cp1252").

This package presents all text strings as Python unicode objects. From Excel 97 onwards, text in Excel spreadsheets has been stored as UTF-16LE (a 16-bit Unicode Transformation Format). Older files (Excel 95 and earlier) don’t keep strings in Unicode; a CODEPAGE record provides a codepage number (for example, 1252) which is used by xlrd to derive the encoding (for same example: "cp1252") which is used to translate to Unicode.

我要看的第一步是确定实际编码.该文件有多久了,以及它是如何创建的(实际上是Excel?还是通过第三方工具).

My first step to look at this would be to determine the actual encoding. How old is the file and how was it was created (actual Excel? or via a 3rd party tool).

您可以通过在文本/十六进制编辑器中打开文件来查找CODEPAGE记录,然后尝试强制进行这种编码.

You could look for the CODEPAGE record by opening the file in a text/hex editor and then try to force that encoding.

基于它不是utf-16le(xlrd的默认假设)的错误,这听起来对我来说,所以您将不得不以某种方式确定它,否则就开始尝试随机编码,例如:

It sounds to me based on the error that it isn't utf-16le (the default assumption of xlrd), so you're going to have to determine it somehow or else start trying random encodings eg:

book = xlrd.open_workbook(..., encoding_override="cp1252")
book = xlrd.open_workbook(..., encoding_override="utf-8")
book = xlrd.open_workbook(..., encoding_override="latin-1")

这篇关于在PYTHON中读取EXCEL时'utf-16-le'编解码器无法解码字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆