如何从.h5文件中提取数据并将其正确保存到.txt或.csv中? [英] How can extract data from .h5 file and save it in .txt or .csv properly?

查看:766
本文介绍了如何从.h5文件中提取数据并将其正确保存到.txt或.csv中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

经过大量搜索后,我找不到一种简单的方法来从 .h5 中提取数据,然后由 Numpy 将其传递给 data.Frame .code>或 Pandas ,以便保存在 .txt .csv 文件中.

After searching a lot I couldn't find a simple way to extract data from .h5 and pass it to a data.Frame by Numpy or Pandas in order to save in .txt or .csv file.

import h5py
import numpy as np
import pandas as pd

filename = 'D:\data.h5'
f = h5py.File(filename, 'r')

# List all groups
print("Keys: %s" % f.keys())
a_group_key = list(f.keys())[0]

# Get the data
data = list(f[a_group_key])
pd.DataFrame(data).to_csv("hi.csv")

Keys: <KeysViewHDF5 ['dd48']>

当我打印数据时,我看到以下结果:

When I print data I see following results:

print(data)

['axis0',
 'axis1',
 'block0_items',
 'block0_values',
 'block1_items',
 'block1_values']

如果有人解释我是什么,以及我如何完全提取数据并将其保存在.csv文件中,我将不胜感激.似乎没有常规的方式可以做到这一点,而且还具有挑战性!到目前为止,我只是可以通过以下方式查看部分数据:

I would appreciate the if someone explain me what are they and how I can extract data completely and save it in .csv file. It seems there hasn't been a routine way to do that and it's kind of challenging yet! Until now I just could see part of data via:

import numpy as np 
dfm = np.fromfile('D:\data.h5', dtype=float)
print (dfm.shape)
print(dfm[5:])

dfm=pd.to_csv('train.csv')
#dfm.to_csv('hi.csv', sep=',', header=None, index=None)

我希望在 .h5 文件中提取时间戳测量.

My expectation is to extract time_stamps and measurements in .h5 file.

推荐答案

h5py 将以numpy数组访问HDF5数据集.调用获取键将返回数据集名称的列表.现在有了它们,将它们作为一个numpy数组进行访问并编写它们应该非常简单.您需要让dtype知道每一列中的内容才能正确格式化.

h5py will access HDF5 datasets as numpy arrays. Your call to get the keys returns a LIST of the dataset names. Now that you have them, it should be pretty simple to access them as a numpy array and write them. You need to get the dtype to know what is in each column to format correctly.

更新了5/22/2019 以反映评论链接中发布的 data.h5 的内容. np.savetxt()中的默认格式为'%.18e'.提供了非常简单(粗略)的逻辑来基于dtype修改这些数据集的格式.这需要更健壮的dtype检查和格式化以供一般使用.另外,您将需要添加逻辑以解码unicode字符串.

Updated 5/22/2019 to reflect content of data.h5 posted at link in comment. Default format in np.savetxt() is '%.18e'. Very simple (crude) logic provided to modify format based on dtype for these datasets. This requires more robust dtype checking and formatting for general use. Also, you will need to add logic to decode unicode strings.

import h5py
filename = 'D:\data.h5'
import numpy as np
h5f = h5py.File(filename, 'r')
# get a List of data sets in group 'dd48'
a_dset_keys = list(h5f['dd48'].keys())

# Get the data
for dset in a_dset_keys :
    ds_data = (h5f['dd48'][dset])
    print ('dataset=', dset)
    print (ds_data.dtype)
    if ds_data.dtype == 'float64' :
        csvfmt = '%.18e'
    elif ds_data.dtype == 'int64' :
        csvfmt = '%.10d'
    else:
        csvfmt = '%s'
    np.savetxt('output_'+dset+'.csv', ds_data, fmt=csvfmt, delimiter=',')

这篇关于如何从.h5文件中提取数据并将其正确保存到.txt或.csv中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆