子类化Pandas DataFrame,可以更新吗? [英] Subclassing a Pandas DataFrame, updates?
问题描述
要继承还是不继承?
关于Pandas的子类化问题的最新消息是什么? (其他大多数线程都使用3-4年).
What is the latest on the subclassing issue for Pandas? (Most of the other threads are 3-4 years old).
我希望做类似...的事情
I am hoping to do something like ...
import pandas as pd
class SomeData(pd.DataFrame):
# Methods
pass
ClsInstance = SomeData()
# Create a new column on ClsInstance?
推荐答案
这就是我的操作方式.我遵循了发现的建议:
This is how I've done it. I've followed advice found:
下面的示例仅显示构造pandas.DataFrame
的新子类的用法.如果您按照我的第一个链接中的建议进行操作,则也可以考虑对pandas.Series
进行子类化,以考虑获取pandas.DataFrame
子类的一维切片.
The example below only shows the use of constructing new subclasses of pandas.DataFrame
. If you follow the advice in my first link, you may consider subclassing pandas.Series
as well to account for taking single dimensional slices of your pandas.DataFrame
subclass.
import pandas as pd
import numpy as np
class SomeData(pd.DataFrame):
# This class variable tells Pandas the name of the attributes
# that are to be ported over to derivative DataFrames. There
# is a method named `__finalize__` that grabs these attributes
# and assigns them to newly created `SomeData`
_metadata = ['my_attr']
@property
def _constructor(self):
"""This is the key to letting Pandas know how to keep
derivative `SomeData` the same type as yours. It should
be enough to return the name of the Class. However, in
some cases, `__finalize__` is not called and `my_attr` is
not carried over. We can fix that by constructing a callable
that makes sure to call `__finlaize__` every time."""
def _c(*args, **kwargs):
return SomeData(*args, **kwargs).__finalize__(self)
return _c
def __init__(self, *args, **kwargs):
# grab the keyword argument that is supposed to be my_attr
self.my_attr = kwargs.pop('my_attr', None)
super().__init__(*args, **kwargs)
def my_method(self, other):
return self * np.sign(self - other)
示范
mydata = SomeData(dict(A=[1, 2, 3], B=[4, 5, 6]), my_attr='an attr')
print(mydata, type(mydata), mydata.my_attr, sep='\n' * 2)
A B
0 1 4
1 2 5
2 3 6
<class '__main__.SomeData'>
an attr
newdata = mydata.mul(2)
print(newdata, type(newdata), newdata.my_attr, sep='\n' * 2)
A B
0 2 8
1 4 10
2 6 12
<class '__main__.SomeData'>
an attr
newerdata = mydata.my_method(newdata)
print(newerdata, type(newerdata), newerdata.my_attr, sep='\n' * 2)
A B
0 -1 -4
1 -2 -5
2 -3 -6
<class '__main__.SomeData'>
an attr
陷阱
这使方法pd.DataFrame.equals
Gotchas
This borks on the method pd.DataFrame.equals
newerdata.equals(newdata) # Should be `False`
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-304-866170ab179e> in <module>()
----> 1 newerdata.equals(newdata)
~/anaconda3/envs/3.6.ml/lib/python3.6/site-packages/pandas/core/generic.py in equals(self, other)
1034 the same location are considered equal.
1035 """
-> 1036 if not isinstance(other, self._constructor):
1037 return False
1038 return self._data.equals(other._data)
TypeError: isinstance() arg 2 must be a type or tuple of types
发生的事情是该方法希望在_constructor
属性中找到类型为type
的对象.相反,它发现我放置了可通话项,以便修复遇到的__finalize__
问题.
What happens is that this method expected to find an object of type type
in the _constructor
attribute. Instead, it found my callable that I placed there in order to fix the __finalize__
issue I came across.
变通
在类定义中使用以下内容覆盖equals
方法.
Override the equals
method with the following in your class definition.
def equals(self, other):
try:
pd.testing.assert_frame_equal(self, other)
return True
except AssertionError:
return False
newerdata.equals(newdata) # Should be `False`
False
这篇关于子类化Pandas DataFrame,可以更新吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!