Pandas DataFrame对象继承还是对象使用? [英] Pandas DataFrame Object Inheritance or Object Use?

查看:520
本文介绍了Pandas DataFrame对象继承还是对象使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个用于处理非常具体的结构化数据的库,我正在Pandas之上构建我的基础架构。目前,我正在为不同的用例编写一堆不同的数据容器,例如CTMatrix for Country x Time Data等,以容纳适用于所有CountryxTime结构化数据的方法。

I am building a library for working with very specific structured data and I am building my infrastructure on top of Pandas. Currently I am writing a bunch of different data containers for different use cases, such as CTMatrix for Country x Time Data etc. to house methods appropriate for all CountryxTime structured data.

我目前正在讨论

选项1:对象继承

class CTMatrix(pd.DataFrame):
    methods etc. here

选项2:对象使用

class CTMatrix(object):
    _data = pd.DataFrame

    then use getter, setter methods to control access to _data etc. 

从软件工程的角度来看,这里有一个明显的选择吗?

From a software engineering perspective is there an obvious choice here?

到目前为止,我的想法是:

My thoughts so far are:

选项1:


  1. 可以直接在CTMatrix类上使用DataFrame方法(如 CTmatrix.sort())而无需通过选项#2中封装的 _data 对象上的方法来支持它们

  2. 继承更新和Pandas中的新方法,方法除外可能被本地类方法覆盖

  1. Can use DataFrame methods directly on the CTMatrix Class (like CTmatrix.sort()) without having to support them via methods on the encapsulated _data object in Option #2
  2. Updates and New methods in Pandas are inherited, except for methods that may be overwritten with local class methods


  1. 使用某些方法的并发症,例如 __ init __()并且必须将属性传递给超类 super(MyDF,self).__ init__ (* args,** kw)

  1. Complications with some methods such as __init__() and having to pass the attributes up to the superclass super(MyDF, self).__init__(*args, **kw)

选项2:


  1. 对类及其行为的更多控制

  2. Pandas的更新可能更具弹性?

但是


  1. 必须使用getter()或非隐藏属性使用对象,如数据框,如( CTMatrix.data.sort()

  1. Having to use a getter() or non-hidden attribute to use the object like a dataframe such as (CTMatrix.data.sort())

在选项#1中采用这种方法还有其他缺点吗?

Are there any additional downsides for taking the approach in Option #1?

推荐答案

我会避免继承子类 DataFrame ,因为许多 DataFrame 方法将返回一个新的 DataFrame 而不是 CTMatrix 对象的另一个实例。

I would avoid subclassing DataFrame, because many of the DataFrame methods will return a new DataFrame and not another instance of your CTMatrix object.

有一些是开放的关于GitHub的问题,例如:

There are a few of open issues on GitHub around this e.g.:

https://github.com/pydata/pandas/issues/2485

更一般地说,这是一个构成与继承的问题。我会特别警惕#2的好处。它现在看起来很棒,但除非你密切关注熊猫的更新(它是一个快速移动的目标),否则你很容易就会产生意想不到的后果,你的代码最终会与熊猫交织在一起。

More generally, this is a question of composition vs inheritance. I would be especially wary of benefit #2. It might seem great now, but unless you are keeping a close eye on updates to Pandas (and it is a fast moving target), you can easily end up with unexpected consequences and your code will end up intertwined with Pandas.

这篇关于Pandas DataFrame对象继承还是对象使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆