使用to_csv时如何保留数据帧的dtypes? [英] how to preserve dtypes of dataframes when using to_csv?

查看:476
本文介绍了使用to_csv时如何保留数据帧的dtypes?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了减少内存成本,我使用astype()指定了熊猫数据框的dtype,例如:

To reduce memory costs, I specified dtypes of my pandas dataframe using astype(),like:

df['A'] = df['A'].astype(int8)

然后我使用to_csv()来存储它,但是当我再次使用read_csv()来读取它并检查dtypes时,我发现它仍然存储在int64中. 将dtypes保存到本地存储时如何保存?

then I use to_csv() to store it, but when I use read_csv() to read it again and check the dtypes, I found it still stored in int64. How can I preserve the dtypes while saving it in local storages?

推荐答案

以下是 a 的实现方法:

import pandas as pd

# Create Example data with types
df = pd.DataFrame({
    'words': ['foo', 'bar', 'spam', 'eggs'],
    'nums': [1, 2, 3, 4]
}).astype(dtype={
    'words': 'object',
    'nums': 'int8'
})

def to_csv(df, path):
    # Prepend dtypes to the top of df (from https://stackoverflow.com/a/43408736/7607701)
    df.loc[-1] = df.dtypes
    df.index = df.index + 1
    df.sort_index(inplace=True)
    # Then save it to a csv
    df.to_csv(path, index=False)

def read_csv(path):
    # Read types first line of csv
    dtypes = pd.read_csv('tmp.csv', nrows=1).iloc[0].to_dict()
    # Read the rest of the lines with the types from above
    return pd.read_csv('tmp.csv', dtype=dtypes, skiprows=[1])


print('Before: \n{}\n'.format(df.dtypes))

to_csv(df, 'tmp.csv')
df = read_csv('tmp.csv')

print('After: \n{}\n'.format(df.dtypes))

输出:

Before: 
nums       int8
words    object
dtype: object

After: 
nums       int8 # still int8
words    object
dtype: object

这篇关于使用to_csv时如何保留数据帧的dtypes?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆