使用python将Excel转换为Feather格式 [英] Converting excel to feather format with python

查看:347
本文介绍了使用python将Excel转换为Feather格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约100个大型excel文件的列表(每天都在增加),我使用Python对其进行了分析.由于我必须对所有文件运行多个循环,因此我的分析越来越慢.因此,我想将所有excel文件转换为羽毛格式(例如每周一次).有聪明的方法吗?到目前为止,我已经尝试过:

I have a (daily growing) list of around 100 big excel files, which I analyse in Python. As I have to run several loops over all the files, my analysis are getting slower and slower. Therefore I'd like to convert all excel files into feather format (like once a week). Is there a clever way to do that? What I have tried so far:

path = r"filepath\*_name*.xlsx"
file_list = glob.glob(path)
for f in file_list:
    df = pd.read_excel(f, encoding='utf-8')
    df[['boola', 'boolb']] = dfa[['boola', 'boolb']].astype(int)
    pathname = f[:-5] + ".ftr"
    df.to_feather(pathname)

但是我收到以下错误消息:

But I'm getting the following error message:

ArrowInvalid: ('Could not convert stringa with type str: tried to convert to boolean', "Conversion failed for column stringb with type object")

推荐答案

以下是解决我的问题的方法:

Here is what solved my problem:

path = r"pathname\*_somename*.xlsx"
file_list = glob.glob(path)
for f in file_list:
    df = pd.read_excel(f, encoding='utf-8', decimal=',', thousands='.')
    for col in df.columns:
            w= (df[[col]].applymap(type) != df[[col]].iloc[0].apply(type)).any(axis=1)
            if len(df[w]) > 0:

                df[col] = df[col].astype(str)

            if df[col].dtype == list:
                df[col] = df[col].astype(str)
    pathname = f[:-4] + "ftr"
    df.to_feather(pathname)
df.head()

, decimal=',', thousands='.'部分是必需的,因为我的输入文件是按照欧洲标准格式化的,即使用逗号作为小数点分隔符,并使用点作为千位分隔符

the , decimal=',', thousands='.' part was necessary because my input file was formatted in European standard, i.e. using comma as a decimal separator and a dot as thousands separator

这篇关于使用python将Excel转换为Feather格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆