为什么子类化DataFrame会使原始对象变异? [英] Why does subclassing a DataFrame mutate the original object?
问题描述
我忽略了警告 a>并尝试对熊猫DataFrame进行子类化.我这样做的原因如下:
I am ignoring the warnings and trying to subclass a pandas DataFrame. My reasons for doing so are as follows:
- 我想保留
DataFrame
的所有现有方法. - 我想在类实例化时设置一些其他属性,这些属性稍后将用于定义我可以在子类上调用的其他方法.
- I want to retain all the existing methods of
DataFrame
. - I want to set a few additional attributes at class instantiation, which will later be used to define additional methods that I can call on the subclass.
这是一个片段:
class SubFrame(pd.DataFrame):
def __init__(self, *args, **kwargs):
freq = kwargs.pop('freq', None)
ddof = kwargs.pop('ddof', None)
super(SubFrame, self).__init__(*args, **kwargs)
self.freq = freq
self.ddof = ddof
self.index.freq = pd.tseries.frequencies.to_offset(self.freq)
@property
def _constructor(self):
return SubFrame
这是一个使用示例.说我有DataFrame
Here's a use example. Say I have the DataFrame
print(df)
col0 col1 col2
2014-07-31 0.28393 1.84587 -1.37899
2014-08-31 5.71914 2.19755 3.97959
2014-09-30 -3.16015 -7.47063 -1.40869
2014-10-31 5.08850 1.14998 2.43273
2014-11-30 1.89474 -1.08953 2.67830
索引没有频率
print(df.index)
DatetimeIndex(['2014-07-31', '2014-08-31', '2014-09-30', '2014-10-31',
'2014-11-30'],
dtype='datetime64[ns]', freq=None)
使用SubFrame
使我可以一步指定该频率:
Using SubFrame
allows me to specify that frequency in one step:
sf = SubFrame(df, freq='M')
print(sf.index)
DatetimeIndex(['2014-07-31', '2014-08-31', '2014-09-30', '2014-10-31',
'2014-11-30'],
dtype='datetime64[ns]', freq='M')
问题是,这修改了df
:
print(df.index.freq)
<MonthEnd>
这是怎么回事,我该如何避免呢?
What's going on here, and how can I avoid this?
此外,我自称使用了我无法完全理解的复制代码很好.上面的__init__
中发生了什么?是否有必要在pop
中使用args/kwargs? (为什么我不能像往常一样指定参数?)
Moreover, I profess to using copied code that I don't understand all that well. What is happening within __init__
above? Is it necessary to use args/kwargs with pop
here? (Why can't I just specify params as usual?)
推荐答案
我将添加到警告中.并不是说我想劝阻您,我实际上为您的努力表示赞赏.
I'll add to the warnings. Not that I want to discourage you, I actually applaud your efforts.
但是,这不会是您最后关于发生什么问题的问题.
However, this won't the last of your questions as to what is going on.
也就是说,一旦您运行:
That said, once you run:
super(SubFrame, self).__init__(*args, **kwargs)
self
是真实的数据帧.您是通过将另一个数据框传递给构造函数来创建它的.
self
is a bone-fide dataframe. You created it by passing another dataframe to the constructor.
尝试作为实验
d1 = pd.DataFrame(1, list('AB'), list('XY'))
d2 = pd.DataFrame(d1)
d2.index.name = 'IDX'
d1
X Y
IDX
A 1 1
B 1 1
因此观察到的行为是一致的,因为当您通过将另一个数据帧传递给构造函数来构造一个数据帧时,最终会指向相同的对象.
So the observed behavior is consistent, in that when you construct one dataframe by passing another dataframe to the constructor, you end up pointing to the same objects.
要回答您的问题,子类化不是允许对原始对象进行变异的方法……而是熊猫从传递的数据帧构造数据帧的方式.
To answer your question, subclassing isn't what is allowing the mutating of the original object... its the way pandas constructs a dataframe from a passed dataframe.
通过实例化副本来避免这种情况
Avoid this by instantiating with a copy
d2 = pd.DataFrame(d1.copy())
__init__
您希望将所有args
和kwargs
传递给pd.DataFrame.__init__
,但特定于子类的特定kwargs
除外.在这种情况下,freq
和ddof
. pop
是一种方便的方法,可在将值传递给pd.DataFrame.__init__
You want to pass on all the args
and kwargs
to pd.DataFrame.__init__
with the exception of the specific kwargs
that are intended for your subclass. In this case, freq
and ddof
. pop
is a convenient way to grab the values and delete the key from kwargs
before passing it on to pd.DataFrame.__init__
我将如何实现pipe
def add_freq(df, freq):
df = df.copy()
df.index.freq = pd.tseries.frequencies.to_offset(freq)
return df
df = pd.DataFrame(dict(A=[1, 2]), pd.to_datetime(['2017-03-31', '2017-04-30']))
df.pipe(add_freq, 'M')
这篇关于为什么子类化DataFrame会使原始对象变异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!