为什么子类化DataFrame会使原始对象变异? [英] Why does subclassing a DataFrame mutate the original object?

查看:83
本文介绍了为什么子类化DataFrame会使原始对象变异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我忽略了警告 a>并尝试对熊猫DataFrame进行子类化.我这样做的原因如下:

I am ignoring the warnings and trying to subclass a pandas DataFrame. My reasons for doing so are as follows:

  • 我想保留DataFrame的所有现有方法.
  • 我想在类实例化时设置一些其他属性,这些属性稍后将用于定义我可以在子类上调用的其他方法.
  • I want to retain all the existing methods of DataFrame.
  • I want to set a few additional attributes at class instantiation, which will later be used to define additional methods that I can call on the subclass.

这是一个片段:

class SubFrame(pd.DataFrame):

    def __init__(self, *args, **kwargs):
        freq = kwargs.pop('freq', None)
        ddof = kwargs.pop('ddof', None)
        super(SubFrame, self).__init__(*args, **kwargs)
        self.freq = freq
        self.ddof = ddof
        self.index.freq = pd.tseries.frequencies.to_offset(self.freq)

    @property
    def _constructor(self):
        return SubFrame

这是一个使用示例.说我有DataFrame

Here's a use example. Say I have the DataFrame

print(df)
               col0     col1     col2
2014-07-31  0.28393  1.84587 -1.37899
2014-08-31  5.71914  2.19755  3.97959
2014-09-30 -3.16015 -7.47063 -1.40869
2014-10-31  5.08850  1.14998  2.43273
2014-11-30  1.89474 -1.08953  2.67830

索引没有频率

print(df.index)
DatetimeIndex(['2014-07-31', '2014-08-31', '2014-09-30', '2014-10-31',
               '2014-11-30'],
              dtype='datetime64[ns]', freq=None)

使用SubFrame使我可以一步指定该频率:

Using SubFrame allows me to specify that frequency in one step:

sf = SubFrame(df, freq='M')
print(sf.index)
DatetimeIndex(['2014-07-31', '2014-08-31', '2014-09-30', '2014-10-31',
               '2014-11-30'],
              dtype='datetime64[ns]', freq='M')

问题是,这修改了df:

print(df.index.freq)
<MonthEnd>

这是怎么回事,我该如何避免呢?

What's going on here, and how can I avoid this?

此外,我自称使用了我无法完全理解的复制代码很好.上面的__init__中发生了什么?是否有必要在pop中使用args/kwargs? (为什么我不能像往常一样指定参数?)

Moreover, I profess to using copied code that I don't understand all that well. What is happening within __init__ above? Is it necessary to use args/kwargs with pop here? (Why can't I just specify params as usual?)

推荐答案

我将添加到警告中.并不是说我想劝阻您,我实际上为您的努力表示赞赏.

I'll add to the warnings. Not that I want to discourage you, I actually applaud your efforts.

但是,这不会是您最后关于发生什么问题的问题.

However, this won't the last of your questions as to what is going on.

也就是说,一旦您运行:

That said, once you run:

super(SubFrame, self).__init__(*args, **kwargs)

self是真实的数据帧.您是通过将另一个数据框传递给构造函数来创建它的.

self is a bone-fide dataframe. You created it by passing another dataframe to the constructor.

尝试作为实验

d1 = pd.DataFrame(1, list('AB'), list('XY'))
d2 = pd.DataFrame(d1)

d2.index.name = 'IDX'

d1

     X  Y
IDX      
A    1  1
B    1  1

因此观察到的行为是一致的,因为当您通过将另一个数据帧传递给构造函数来构造一个数据帧时,最终会指向相同的对象.

So the observed behavior is consistent, in that when you construct one dataframe by passing another dataframe to the constructor, you end up pointing to the same objects.

要回答您的问题,子类化不是允许对原始对象进行变异的方法……而是熊猫从传递的数据帧构造数据帧的方式.

To answer your question, subclassing isn't what is allowing the mutating of the original object... its the way pandas constructs a dataframe from a passed dataframe.

通过实例化副本来避免这种情况

Avoid this by instantiating with a copy

d2 = pd.DataFrame(d1.copy())


__init__

您希望将所有argskwargs传递给pd.DataFrame.__init__,但特定于子类的特定kwargs除外.在这种情况下,freqddof. pop是一种方便的方法,可在将值传递给pd.DataFrame.__init__

You want to pass on all the args and kwargs to pd.DataFrame.__init__ with the exception of the specific kwargs that are intended for your subclass. In this case, freq and ddof. pop is a convenient way to grab the values and delete the key from kwargs before passing it on to pd.DataFrame.__init__

我将如何实现pipe

def add_freq(df, freq):
    df = df.copy()
    df.index.freq = pd.tseries.frequencies.to_offset(freq)
    return df

df = pd.DataFrame(dict(A=[1, 2]), pd.to_datetime(['2017-03-31', '2017-04-30']))

df.pipe(add_freq, 'M')

这篇关于为什么子类化DataFrame会使原始对象变异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆