pandas :保存到excel编码问题 [英] Pandas: save to excel encoding issue

查看:203
本文介绍了 pandas :保存到excel编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里提到了类似的问题,但是没有一个建议的方法对我有用。

I have a similar problem to the one mentioned here but none of the suggested methods work for me.

我有一个中等大小的 utf-8 .csv文件很多非ASCII字符。
我从一个列中分离出一个特定值的文件,然后我想将每个获取的数据框保存为.xlsx文件,并保留字符。

I have a medium size utf-8 .csv file with a lot of non-ascii characters. I am splitting the file by a particular value from one of the columns, and then I'd like to save each of the obtained dataframes as an .xlsx file with the characters preserved.

这不起作用,因为我收到错误:

This doesn't work, as I am getting an error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 7: ordinal not in range(128)

这是我的尝试:


  1. 明确使用 xlsxwriter 引擎。这似乎没有改变任何东西。

  2. 定义一个函数(下面)来改变编码并丢弃坏的字符。这也不会改变任何东西。

  1. Using xlsxwriter engine explicitly. This doesn't seem to change anything.
  2. Defining a function (below) to change encoding and throw away bad characters. This also doesn't change anything.

def changeencode(data):
cols = data.columns
for col in cols:
if data[col].dtype == 'O':
    data[col] = data[col].str.decode('utf-8').str.encode('ascii', 'ignore')
return data   


  • 把所有令人反感的字符交给其他人。仍然没有效果(此更改后获得引用的错误)。

  • Changing by hand all the offensive chars to some others. Still no effect (the quoted error was obtained after this change).

    将文件编码为 utf-16 (我认为,自从我以来是正确的编码想要能够在excel之内处理文件)也不会有帮助。

    Encoding the file as utf-16 (which, I believe, is the correct encoding since I want to be able to manipulate the file from within the excel afterwards) doesn't help either.

    我相信问题在文件本身(因为2和3),但我不知道如何解决它。我会感谢任何帮助。文件的开头被粘贴在下面。

    I believe that the problem is in the file itself (because of 2 and 3) but I have no idea how to get around it. I'd appreciate any help. The beginning of the file is pasted below.

    "Submitted","your-name","youremail","phone","miasto","cityCF","innemiasto","languagesCF","morelanguages","wiek","partnerCF","messageCF","acceptance-795","Submitted Login","Submitted From","2015-12-25 14:07:58 +00:00","Zózia kryś","test@tes.pl","4444444","Wrocław","","testujemy polskie znaki","Polski","testujemy polskie znaki","44","test","test","1","Justyna","99.111.155.132",
    

    编辑

    一些代码(其中一个版本,没有拆分部分):

    Some code (one of the versions, without the splitting part):

    import pandas as pd
    import string
    import xlsxwriter
    
    df = pd.read_csv('path-to-file.csv')
    
    with pd.ExcelWriter ('test.xlsx') as writer:
                    df.to_excel(writer, sheet_name = 'sheet1',engine='xlsxwriter')
    


    推荐答案

    pa的版本中ndas 我当时使用的。
    现在,在 pandas ver。 0.19.2,下面的代码从问题中保存csv没有任何麻烦(并且使用正确的编码)。

    注意: openpyxl 模块必须是安装在您的系统上。

    Supposedly this was a bug in the version of pandas which I was using back then. Right now, in pandas ver. 0.19.2, the code below saves the csv from the question without any trouble (and with correct encoding).
    NB: openpyxl module have to be installed on your system.

    import pandas as pd
    df = pd.read_csv('Desktop/test.csv')
    df.to_excel('Desktop/test.xlsx', encoding='utf8')
    

    这篇关于 pandas :保存到excel编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆