如何处理R中的hdf5文件? [英] How to deal with hdf5 files in R?

查看:1045
本文介绍了如何处理R中的hdf5文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个hdf5格式的文件.我知道它应该是一个矩阵,但是我想在R中读取该矩阵,以便我可以对其进行研究.我看到有一个h5r软件包可以帮助解决这个问题,但是我看不到任何易于阅读/理解的教程.这样的教程可以在线获得吗?具体来说,如何使用此程序包读取hdf5对象,以及如何实际提取矩阵?

I have a file in hdf5 format. I know that it is supposed to be a matrix, but I want to read that matrix in R so that I can study it. I see that there is a h5r package that is supposed to help with this, but I do not see any simple to read/understand tutorial. Is such a tutorial available online. Specifically, How do you read a hdf5 object with this package, and how to actually extract the matrix?

更新

我找到了一个软件包rhdf5,它不是CRAN的一部分,而是BioConductoR的一部分.该界面相对容易理解文档,示例代码也很清楚.我可以毫无问题地使用它.我的问题似乎是输入文件.我想读取的矩阵实际上以python pickle的形式存储在hdf5文件中.因此,每次我尝试打开它并通过R访问它时,我都会得到一个segmentation fault.我确实弄清楚了如何将python中的矩阵另存为tsv文件,现在解决了该问题.

I found out a package rhdf5 which is not part of CRAN but is part of BioConductoR. The interface is relatively easier to understand the the documentation and example code is quite clear. I could use it without problems. My problem it seems was the input file. The matrix that I wanted to read was actually stored in the hdf5 file as a python pickle. So every time I tried to open it and access it through R i got a segmentation fault. I did figure out how to save the matrix from within python as a tsv file and now that problem is solved.

推荐答案

尽管rhdf5软件包不在CRAN中,但它确实可以很好地工作.从生物导体

The rhdf5 package works really well, although it is not in CRAN. Install it from Bioconductor

# as of 2020-09-08, these are the updated instructions per
# https://bioconductor.org/install/

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(version = "3.11")

并使用它:

library(rhdf5)

列出文件中的对象以查找要读取的数据组:

List the objects within the file to find the data group you want to read:

h5ls("path/to/file.h5")

读取HDF5数据:

mydata <- h5read("path/to/file.h5", "/mygroup/mydata")

检查结构:

str(mydata)

(请注意,多维数组可能会转置).您还可以阅读组,这些组将在R中命名为列表.

(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.

这篇关于如何处理R中的hdf5文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆