pandas 对象的子类与其他对象的子类有不同的工作方式吗? [英] subclasses of pandas' object work differently from subclass of other object?

查看:56
本文介绍了 pandas 对象的子类与其他对象的子类有不同的工作方式吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建Pandas数据结构的子类,以在我的代码中将 dict 的子类替换为 Series 的子类>,我不明白为什么此示例代码不起作用

I am trying to create a subclass of a Pandas data structure to substitute, in my code, a subclass of a dict with a subclass of a Series, I don't understand why this example code doesn't work

from pandas import Series    

class Support(Series):
    def supportMethod1(self):
        print 'I am support method 1'       
    def supportMethod2(self):
        print 'I am support method 2'

class Compute(object):
    supp=None        
    def test(self):
        self.supp()  

class Config(object):
    supp=None        
    @classmethod
    def initializeConfig(cls):
        cls.supp=Support()
    @classmethod
    def setConfig1(cls):
        Compute.supp=cls.supp.supportMethod1
    @classmethod
    def setConfig2(cls):
        Compute.supp=cls.supp.supportMethod2            

Config.initializeConfig()

Config.setConfig1()    
c1=Compute()
c1.test()

Config.setConfig2()    
c1.test()

更改某些对象的配置可能不是最好的方法,无论如何,我发现它在我的代码中很有用,并且最重要的是,我想了解为什么使用 dict 而不是 series 它按我的预期工作.

Probably it is not the best method to change the configuration of some objects, anyway I found this usefull in my code and most of all I want to understand why with dict instead of series it works as I expect.

非常感谢!

推荐答案

当前答案(熊猫> = 0.13)

Pandas 0.13中的内部重构大大简化了子类.熊猫Series现在可以像其他任何Python对象一样被子类化:

Current Answer (Pandas >= 0.13)

An internal refactor in Pandas 0.13 drastically simplified subclassing. Pandas Series can now be subclassed like any other Python object:

class MySeries(pd.Series):
    def my_method(self):
        return "my_method"

旧版答案(熊猫< = 0.12)

问题是Series使用__new__来确保实例化Series对象.

Legacy Answer (Pandas <= 0.12)

The problem is that Series uses __new__ which is ensuring that a Series object is instantiated.

您可以像这样修改您的课程:

You can modify your class like so:

class Support(pd.Series):
    def __new__(cls, *args, **kwargs):
        arr = Series.__new__(cls, *args, **kwargs)
        return arr.view(Support)

    def supportMethod1(self):
        print 'I am support method 1'       
    def supportMethod2(self):
        print 'I am support method 2'

但是,最好用has-a代替is-a.或猴子修补Series对象.原因是由于使用熊猫的数据存储特性,您经常会丢失其子类.像

However, it's probably best to do a has-a instead of a is-a. Or monkey patch the Series object. The reason is that you will often lose your subclass while using pandas due to the nature of it's data storage. Something as simple as

s.ix[:5] 
s.cumsum()

将返回Series对象而不是您的子类.在内部,数据存储在连续的数组中并针对速度进行了优化.数据仅在需要时装在一个类中,并且这些类是硬编码的.另外,如果s.ix[:5]之类的东西应该返回相同的子类,则不是立即显而易见的.那将取决于您的子类的语义以及附加到它的元数据.

Will return a Series object instead of your subclass. Internally, the data is stored in contiguous arrays and optimized for speed. The data is only boxed with a class when needed and those classes are hardcoded. Plus, it's not immediately obvious if something like s.ix[:5] should return the same subclass. That would depend on the semantics of your subclass and what metadata is attached to it.

http://nbviewer.ipython.org/3366583/subclassing%20pandas %20objects.ipynb 有一些注释.

这篇关于 pandas 对象的子类与其他对象的子类有不同的工作方式吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆