包括元数据的Pandas DataFrames子类 [英] Pickling Pandas DataFrames subclasses which include metadata

查看:91
本文介绍了包括元数据的Pandas DataFrames子类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于将元数据附加到Pandas对象上,以及如何使这些数据在腌制/腌制过程中生存的问题是一个长期存在的问题.我看到了一些非常古老的答案,基本上说你不能.希望这个问题的最新答案是肯定的.我正在使用Pandas 0.23.3.

The question about attaching metadata to Pandas objects, and getting that data to survive a pickle/unpickle process is a perennial one. I see some very old answers, which basically say that you can't. Hopefully, a more current answer to this question will be yes. I'm using Pandas 0.23.3.

我做了一些Pandas DataFrame子类.我想我知道如何正确执行此操作.我有一个_constructor方法,我的__init__方法可以处理BlockManager对象.创建元数据属性时,我会禁止UserWarning,该警告提醒我不要在DataFrame本身中创建列,对于我而言,这很好.

I've made some Pandas DataFrame subclasses. I think I know how to do this correctly. I have a _constructor method, and my __init__ method can handle BlockManager objects. When I create meta-data attributes, I suppress the UserWarning which cautions that I'm not creating a column in the DataFrame itself, which in my case is fine.

当我要将DataFrame保存到磁盘时,我调用my_fancy_df.to_pickle(file_path).当我想重新加载它时,我使用my_fancy_df = pandas.read_pickle(file_path). MY 元数据被删除.熊猫本身具有可以很好地腌制和去除斑点的元数据,例如DataFrame.name属性.我想将这种行为复制到我的属性中.

When I want to save the DataFrame to disk, I call my_fancy_df.to_pickle(file_path). When I want to reload it, I use my_fancy_df = pandas.read_pickle(file_path). MY meta-data gets removed. Pandas itself has meta-data which pickles and unpickles fine, such as the DataFrame.name attribute. I would like to copy this behavior for my attributes.

我可以在子类中拦截.to_pickle调用,并安排将元数据分别写入同一文件对象中.但是我看不到改变数据重新加载方式的等效方法. read_pickle函数是通用的,位于Pandas命名空间中,它不属于DataFrame类.

I could intercept the .to_pickle call in my subclass, and arrange to write the meta-data separately into the same file object. But I don't see an equivalent approach for changing the way that data is reloaded. The read_pickle function is general-purpose, and lives in the Pandas namespace, it doesn't belong to the DataFrame class.

我可能会在类外部编写一个自定义的解开函数,并使用它……这似乎很笨拙.如果有一种优雅的方法可以完成这项工作,那么我还没有找到它.

I could possibly write a custom unpickling function, external to my class and use that... it seems clumsy. If there's an elegant way to get this job done, I haven't found it.

我对泡菜也没有犹豫.例如,如果HDF5更合适,我可以切换.我确实需要在DataFrame中腌制任意Python数据类型.单元格中的内容不仅是字符串和数字,我还具有元组,并且在我构建的一个子类中,甚至将DataFrames放在了DataFrames中.

I'm also not dead-set on using pickle. If HDF5 is more suitable, for example, I can switch. I do need to pickle arbitrary Python data types in the DataFrame, though. The content in the cells is not just strings and numbers, I have tuples as well, and in one subclass I've built I even placed DataFrames inside DataFrames.

感谢您的建议.

推荐答案

用户"root" 的评论很有帮助.我已经确认,如果您在自定义DataFrame子类中定义了一个名为_metadata的类属性,则它是您要通过切片,酸洗和解酸操作保留的实例属性的列表.

The comment from user "root" was helpful. I have confirmed that if you define a class property called _metadata inside your custom DataFrame subclass, it is the list of the instance properties you want to retain through slicing, pickling, and unpickling operations.

这篇关于包括元数据的Pandas DataFrames子类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆