子类化Pandas DataFrame,可以更新吗? [英] Subclassing a Pandas DataFrame, updates?

查看:66
本文介绍了子类化Pandas DataFrame,可以更新吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要继承还是不继承?

关于Pandas的子类化问题的最新消息是什么? (其他大多数线程都使用3-4年).

What is the latest on the subclassing issue for Pandas? (Most of the other threads are 3-4 years old).

我希望做类似...的事情

I am hoping to do something like ...

import pandas as pd

class SomeData(pd.DataFrame):
    # Methods
    pass

ClsInstance = SomeData()

# Create a new column on ClsInstance?

推荐答案

这就是我的操作方式.我遵循了发现的建议:

This is how I've done it. I've followed advice found:

下面的示例仅显示构造pandas.DataFrame的新子类的用法.如果您按照我的第一个链接中的建议进行操作,则也可以考虑对pandas.Series进行子类化,以考虑获取pandas.DataFrame子类的一维切片.

The example below only shows the use of constructing new subclasses of pandas.DataFrame. If you follow the advice in my first link, you may consider subclassing pandas.Series as well to account for taking single dimensional slices of your pandas.DataFrame subclass.

import pandas as pd
import numpy as np

class SomeData(pd.DataFrame):
    # This class variable tells Pandas the name of the attributes
    # that are to be ported over to derivative DataFrames.  There
    # is a method named `__finalize__` that grabs these attributes
    # and assigns them to newly created `SomeData`
    _metadata = ['my_attr']

    @property
    def _constructor(self):
        """This is the key to letting Pandas know how to keep
        derivative `SomeData` the same type as yours.  It should
        be enough to return the name of the Class.  However, in
        some cases, `__finalize__` is not called and `my_attr` is
        not carried over.  We can fix that by constructing a callable
        that makes sure to call `__finlaize__` every time."""
        def _c(*args, **kwargs):
            return SomeData(*args, **kwargs).__finalize__(self)
        return _c

    def __init__(self, *args, **kwargs):
        # grab the keyword argument that is supposed to be my_attr
        self.my_attr = kwargs.pop('my_attr', None)
        super().__init__(*args, **kwargs)

    def my_method(self, other):
        return self * np.sign(self - other)


示范

mydata = SomeData(dict(A=[1, 2, 3], B=[4, 5, 6]), my_attr='an attr')

print(mydata, type(mydata), mydata.my_attr, sep='\n' * 2)

   A  B
0  1  4
1  2  5
2  3  6

<class '__main__.SomeData'>

an attr

newdata = mydata.mul(2)

print(newdata, type(newdata), newdata.my_attr, sep='\n' * 2)

   A   B
0  2   8
1  4  10
2  6  12

<class '__main__.SomeData'>

an attr

newerdata = mydata.my_method(newdata)

print(newerdata, type(newerdata), newerdata.my_attr, sep='\n' * 2)

   A  B
0 -1 -4
1 -2 -5
2 -3 -6

<class '__main__.SomeData'>

an attr


陷阱

这使方法pd.DataFrame.equals


Gotchas

This borks on the method pd.DataFrame.equals

newerdata.equals(newdata)  # Should be `False`

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-304-866170ab179e> in <module>()
----> 1 newerdata.equals(newdata)

~/anaconda3/envs/3.6.ml/lib/python3.6/site-packages/pandas/core/generic.py in equals(self, other)
   1034         the same location are considered equal.
   1035         """
-> 1036         if not isinstance(other, self._constructor):
   1037             return False
   1038         return self._data.equals(other._data)

TypeError: isinstance() arg 2 must be a type or tuple of types

发生的事情是该方法希望在_constructor属性中找到类型为type的对象.相反,它发现我放置了可通话项,以便修复遇到的__finalize__问题.

What happens is that this method expected to find an object of type type in the _constructor attribute. Instead, it found my callable that I placed there in order to fix the __finalize__ issue I came across.

变通

在类定义中使用以下内容覆盖equals方法.

Override the equals method with the following in your class definition.

    def equals(self, other):
        try:
            pd.testing.assert_frame_equal(self, other)
            return True
        except AssertionError:
            return False

newerdata.equals(newdata)  # Should be `False`

False

这篇关于子类化Pandas DataFrame,可以更新吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆