包括元数据的Pandas DataFrames子类 [英] Pickling Pandas DataFrames subclasses which include metadata

查看：91 发布时间：2020/5/24 3:44:41 python pandas dataframe pickle

本文介绍了包括元数据的Pandas DataFrames子类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

关于将元数据附加到Pandas对象上，以及如何使这些数据在腌制/腌制过程中生存的问题是一个长期存在的问题.我看到了一些非常古老的答案，基本上说你不能.希望这个问题的最新答案是肯定的.我正在使用Pandas 0.23.3.

The question about attaching metadata to Pandas objects, and getting that data to survive a pickle/unpickle process is a perennial one. I see some very old answers, which basically say that you can't. Hopefully, a more current answer to this question will be yes. I'm using Pandas 0.23.3.

我做了一些Pandas DataFrame子类.我想我知道如何正确执行此操作.我有一个_constructor方法，我的__init__方法可以处理BlockManager对象.创建元数据属性时，我会禁止UserWarning，该警告提醒我不要在DataFrame本身中创建列，对于我而言，这很好.

I've made some Pandas DataFrame subclasses. I think I know how to do this correctly. I have a _constructor method, and my __init__ method can handle BlockManager objects. When I create meta-data attributes, I suppress the UserWarning which cautions that I'm not creating a column in the DataFrame itself, which in my case is fine.

当我要将DataFrame保存到磁盘时，我调用my_fancy_df.to_pickle(file_path).当我想重新加载它时，我使用my_fancy_df = pandas.read_pickle(file_path). MY 元数据被删除.熊猫本身具有可以很好地腌制和去除斑点的元数据，例如DataFrame.name属性.我想将这种行为复制到我的属性中.

When I want to save the DataFrame to disk, I call my_fancy_df.to_pickle(file_path). When I want to reload it, I use my_fancy_df = pandas.read_pickle(file_path). MY meta-data gets removed. Pandas itself has meta-data which pickles and unpickles fine, such as the DataFrame.name attribute. I would like to copy this behavior for my attributes.

我可以在子类中拦截.to_pickle调用，并安排将元数据分别写入同一文件对象中.但是我看不到改变数据重新加载方式的等效方法. read_pickle函数是通用的，位于Pandas命名空间中，它不属于DataFrame类.

I could intercept the .to_pickle call in my subclass, and arrange to write the meta-data separately into the same file object. But I don't see an equivalent approach for changing the way that data is reloaded. The read_pickle function is general-purpose, and lives in the Pandas namespace, it doesn't belong to the DataFrame class.

我可能会在类外部编写一个自定义的解开函数，并使用它……这似乎很笨拙.如果有一种优雅的方法可以完成这项工作，那么我还没有找到它.

I could possibly write a custom unpickling function, external to my class and use that... it seems clumsy. If there's an elegant way to get this job done, I haven't found it.

我对泡菜也没有犹豫.例如，如果HDF5更合适，我可以切换.我确实需要在DataFrame中腌制任意Python数据类型.单元格中的内容不仅是字符串和数字，我还具有元组，并且在我构建的一个子类中，甚至将DataFrames放在了DataFrames中.

I'm also not dead-set on using pickle. If HDF5 is more suitable, for example, I can switch. I do need to pickle arbitrary Python data types in the DataFrame, though. The content in the cells is not just strings and numbers, I have tuples as well, and in one subclass I've built I even placed DataFrames inside DataFrames.

感谢您的建议.

包括元数据的Pandas DataFrames子类 [英] Pickling Pandas DataFrames subclasses which include metadata

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

包括元数据的Pandas DataFrames子类 [英] Pickling Pandas DataFrames subclasses which include metadata

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭