由于 MemoryError,pandas 的替代方案适用 [英] Alternatives to pandas apply due to MemoryError

查看:18
本文介绍了由于 MemoryError,pandas 的替代方案适用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要应用于数据框的函数:

I have a function that I wish to apply to a dataframe:

def DetermineMid(data, ts):

    if data['U'] == 0 and data['D'] > 0:
        mid = data['C'] + ts / 2

    elif data['U'] > 0 and data['D'] == 0:
        mid = data['C'] - ts / 2

    else:
        diff = data['A'] - data['B']

        if diff == 0:
            mid = data['C'] + 1

        else:
            mid = data['C']

    return mid

我的 df 列是 A、B、C、D、U.

My df columns are A, B, C, D, U.

我的电话如下:

df = df.apply(DetermineMid, args=(5, ), axis=1).

在较小的数据帧上这工作得很好,但对于这个数据帧:

On smaller dataframes this works just fine, but for this dataframe:

日期时间索引:2561527 个条目,2016-11-30 17:00:01 至 2017-11-29 16:00:00 数据列(共 6列):
Z float64
一个 float64
B float64
C float64
u int64
D int64
数据类型:float64(5)、int64(2)
内存使用量:156.3 MB

DatetimeIndex: 2561527 entries, 2016-11-30 17:00:01 to 2017-11-29 16:00:00 Data columns (total 6 columns):
Z float64
A float64
B float64
C float64
U int64
D int64
dtypes: float64(5), int64(2)
memory usage: 156.3 MB
None

我收到一个内存错误.我是否错误地使用了应用程序?我原以为 apply 只是遍历行并根据行值创建一个值 mid,然后删除所有旧值,因为我不再关心它们了.

I receive a MemoryError. Am I using apply incorrectly? I would have thought apply is just iterating through the rows and creating a value mid based on row values, then dropping all the old values as I do not care about them anymore.

有没有更好的方法来做到这一点?

Is there a better way to do that?

推荐答案

使用 np.select

m1= (df['U']==0) & (df['D']>0)

m2 = (df['U']>0) & (df['D']==0)

m3 = (df['A']-df['B'] == 0 )

np.select([m1,m2,m3], [df['C']+ts/2, df['C']-ts/2, df['C']+1 ],df['C'])

这篇关于由于 MemoryError,pandas 的替代方案适用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆