pandas 记忆 [英] Pandas memoization

查看:121
本文介绍了 pandas 记忆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很长的计算,需要重复很多次.因此,我想使用记忆(例如 jug joblib ),与 Pandas 配合使用.问题在于该程序包是否可以很好地记住Pandas DataFrames作为方法参数.

I have lengthy computations which I repeat many times. Therefore, I would like to use memoization (packages such as jug and joblib), in concert with Pandas. The problem is whether the package would memoize well Pandas DataFrames as method arguments.

有人尝试过吗?还有其他推荐的包装/方法吗?

Has anyone tried it? Is there any other recommended package/way to do this?

推荐答案

水罐的作者在这里:水罐工作正常.我只是尝试了以下方法,但效果很好:

Author of jug here: jug works fine. I just tried the following and it works:

from jug import TaskGenerator
import pandas as pd
import numpy as np


@TaskGenerator
def gendata():
    return pd.DataFrame(np.arange(343440).reshape((10,-1)))

@TaskGenerator
def compute(x):
    return x.mean()

y = compute(gendata())

它效率不高,因为它只是在内部为DataFrame使用pickle(尽管它是动态压缩的,所以在内存使用方面并不可怕;只是比它慢)是).

It is not as efficient as it could be as it just uses pickle internally for the DataFrame (although it compresses it on the fly, so it is not horrible in terms of memory use; just slower than it could be).

我愿意接受将其保存为特例的更改,就像壶对numpy数组当前所做的那样:

I would be open to a change which saves these as a special case as jug currently does for numpy arrays: https://github.com/luispedro/jug/blob/master/jug/backends/file_store.py#L102

这篇关于 pandas 记忆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆