蟒蛇:转换损坏的xls文件 [英] python: converting corrupt xls file

查看:49
本文介绍了蟒蛇:转换损坏的xls文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从SAP应用程序中下载了很少的销售数据集.SAP已自动将数据转换为.XLS文件.每当我使用 Pandas 库打开它时,都会出现以下错误:

I have downloaded few sales dataset from a SAP application. SAP has automatically converted the data to .XLS file. Whenever I open it using Pandas library I am getting the following error:

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\xff\xfe\r\x00\n\x00\r\x00'

当我使用MSEXCEL打开.XLS文件时,当我单击是"显示正确的数据时,会显示一个弹出窗口,指出文件已损坏或扩展名不受支持,您要继续吗.当我使用msexcel将文件再次保存为.xls时,可以使用 Pandas 使用它.

When I opened the .XLS file using MSEXCEL it is shows a popup saying that the file is corrupt or unsupported extension do you want to continue when I clicked 'Yes' its showing the correct data. When I saved the file again as .xls using msexcel I am able to use it using Pandas.

因此,我尝试使用 os.rename()重命名该文件,但它确实起作用.我尝试打开该文件并删除 \ xff \ xfe \ r \ x00 \ n \ x00 \ r \ x00 ,但它也可以正常工作.

So, I tried renaming the file using os.rename() but it dint work. I tried opening the file and removing \xff\xfe\r\x00\n\x00\r\x00, but then also it dint work.

解决方案是打开MSEXCEL,然后再次手动将文件另存为.xls,是否有任何方法可以自动执行此操作.请帮助.

The solution is to open MSEXCEL and save the file again as .xls manually, is there any way to automate this. Kindly help.

推荐答案

最后,我将损坏的 .xls 转换为正确的 .xls 文件.以下是代码:

Finally I converted the corrupt .xls to a correct .xls file. The following is the code:

# Changing the data types of all strings in the module at once
from __future__ import unicode_literals
# Used to save the file as excel workbook
# Need to install this library
from xlwt import Workbook
# Used to open to corrupt excel file
import io

filename = r'SALEJAN17.xls'
# Opening the file using 'utf-16' encoding
file1 = io.open(filename, "r", encoding="utf-16")
data = file1.readlines()

# Creating a workbook object
xldoc = Workbook()
# Adding a sheet to the workbook object
sheet = xldoc.add_sheet("Sheet1", cell_overwrite_ok=True)
# Iterating and saving the data to sheet
for i, row in enumerate(data):
    # Two things are done here
    # Removeing the '\n' which comes while reading the file using io.open
    # Getting the values after splitting using '\t'
    for j, val in enumerate(row.replace('\n', '').split('\t')):
        sheet.write(i, j, val)

# Saving the file as an excel file
xldoc.save('myexcel.xls')

import pandas as pd
df = pd.ExcelFile('myexcel.xls').parse('Sheet1')

没有错误.

这篇关于蟒蛇:转换损坏的xls文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆