通过to_hdf将 pandas 数据帧对象写入hdf5会创建axis0、axis1、lock0_Items和lock0_Values，但为什么呢？ [英] Writing of pandas dataframe object to hdf5 via to_hdf is creating axis0, axis1, block0_items and block0_values, but why?

查看：29 发布时间：2022/4/2 12:44:17 python pandas dataframe octave hdf5

本文介绍了通过to_hdf将 pandas 数据帧对象写入hdf5会创建axis0、axis1、lock0_Items和lock0_Values，但为什么呢？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个名为test.csv的CSV文件，内容如下：

d,t,s,A,B
2021293,010000,.189545,-9.3868122,46.152637
2021293,010000,.388550,-9.3991013,46.22963
2021293,010000,.588547,-9.350419,46.189907
2021293,010000,.788544,-9.3768988,46.166893
2021293,010000,.988541,-9.3335829,46.134583
2021293,010001,.188538,-9.3287783,46.233955
2021293,010001,.388550,-9.3323059,46.203461
2021293,010001,.588547,-9.2911615,46.19883
2021293,010001,.788544,-9.322463,46.135742
2021293,010001,.988541,-9.2798738,46.236137

当我运行以下代码时：

import numpy as np
import pandas as pd

csv_filename = 'test.csv'
hdf_filename = 'test.h5'

csv_data = pd.read_csv(csv_filename )
data     = pd.DataFrame.transpose(csv_data)

data.to_hdf(hdf_filename, key='foobar/data', mode='w', format='fixed')

然后检查octave或matlab中的hdf5文件，通过load test.h5，我在foobar.data下面看到：

ans =

  1x1 struct array containing the fields:

    axis0
    axis1
    block0_items
    block0_values

但将使用hdf5文件的员工希望foobar.data.block0_values的内容在foobar.data中直接可用，而不必遍历foobar.data.block0_values。我如何更改这一点？

foobar.data.block0_values的内容为

foobar.data.block0_values
ans =

                 2021293                   10000                0.189545              -9.3868122               46.152637
                 2021293                   10000                 0.38855              -9.3991013                46.22963
                 2021293                   10000                0.588547               -9.350419               46.189907
                 2021293                   10000                0.788544      -9.376898799999999               46.166893
                 2021293                   10000                0.988541              -9.3335829               46.134583
                 2021293                   10001                0.188538              -9.3287783               46.233955
                 2021293                   10001                 0.38855              -9.3323059               46.203461
                 2021293                   10001                0.588547      -9.291161499999999                46.19883
                 2021293                   10001                0.788544      -9.322463000000001               46.135742
                 2021293                   10001                0.988541      -9.279873800000001               46.236137

而我希望该内容直接位于foobar.data中。

推荐答案

HDF5是容器，不是固定格式。每个软件包都可以按照自己的意愿自由实现HDF5模式。因此，您必须了解每个包所需的HDF5模式。根据我有限的Pandas经验，HDF5数据总是使用您看到的模式(数据集命名为：axis0, axis1, block0_items, block0_values，有时命名为block1_items, block1_values)编写的。如果文件需要在matlab和/或octave中工作，您需要确定他们在读取HDF5数据时所需的架构。

HDF5有两种基本数据集类型：

同构所有值都具有相同类型的数据集：ALLints或floats或strings。这看起来像是 pandas 使用的方法。
异类值保存在不同类型的列中的数据集。

您的数据的挑战是整型和浮点型的混合。这意味着您的HDF5架构有两种可能的方法：

使用DataSet 1中的ints和DataSet 2中的floats创建同构数据集(外加一些要重组的信息)。这就是 pandas 的做法。
创建异类数据集。结果看起来就像是 pandas 的数据帧 HDFView。您可以使用PyTables或h5py包来执行此操作。这个关键是从数据帧数据类型创建一个NumPy重数组，然后将数据框值加载到重数组中。根据以前的HDF5经验，我非常确信&matlab可以像您预期的那样读取此格式。

此示例说明如何使用这两个包创建异类数据集。唯一真正的区别是创建数据集的函数调用。(注：Pandas使用PyTables访问HDF5，因此可能已随Pandas一起安装--但您必须进行验证。)

将以下几行添加到您的示例中以查看其工作原理：

# extract column names and dtypes to create the recarray dtype
arr_dt = []   
for col in csv_data.columns:
    arr_dt.append( (col, csv_data[col].dtype) )   
nrows = csv_data.values.shape[0]    
# create an empty recarray based on Pandas dataframe row count and dtype
arr = np.empty( (nrows,), dtype=arr_dt )

# load dataframe column values into the recarray fields
for col in csv_data.columns:
    arr[col] = csv_data[col].values
    
print(arr)   

# use PyTables to write recarray to h5 file
import tables as tb
with tb.File(hdf_filename, mode='a')  as h5f:
    h5f.create_table('/tb','csv_data',obj=arr,createparents=True)
    
# use h5py to write recarray to h5 file
import h5py
with h5py.File(hdf_filename, mode='a')  as h5f:
    h5f.create_dataset('h5py/csv_data',data=arr)

这篇关于通过to_hdf将 pandas 数据帧对象写入hdf5会创建axis0、axis1、lock0_Items和lock0_Values，但为什么呢？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过to_hdf将 pandas 数据帧对象写入hdf5会创建axis0、axis1、lock0_Items和lock0_Values，但为什么呢？ [英] Writing of pandas dataframe object to hdf5 via to_hdf is creating axis0, axis1, block0_items and block0_values, but why?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

通过to_hdf将 pandas 数据帧对象写入hdf5会创建axis0、axis1、lock0_Items和lock0_Values，但为什么呢？ [英] Writing of pandas dataframe object to hdf5 via to_hdf is creating axis0, axis1, block0_items and block0_values, but why?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭