如何处理R中的hdf5文件? [英] How to deal with hdf5 files in R?
问题描述
我有一个hdf5
格式的文件.我知道它应该是一个矩阵,但是我想在R
中读取该矩阵,以便我可以对其进行研究.我看到有一个h5r
软件包可以帮助解决这个问题,但是我看不到任何易于阅读/理解的教程.这样的教程可以在线获得吗?具体来说,如何使用此程序包读取hdf5
对象,以及如何实际提取矩阵?
I have a file in hdf5
format. I know that it is supposed to be a matrix, but I want to read that matrix in R
so that I can study it. I see that there is a h5r
package that is supposed to help with this, but I do not see any simple to read/understand tutorial. Is such a tutorial available online. Specifically, How do you read a hdf5
object with this package, and how to actually extract the matrix?
更新
我找到了一个软件包rhdf5
,它不是CRAN的一部分,而是BioConductoR的一部分.该界面相对容易理解文档,示例代码也很清楚.我可以毫无问题地使用它.我的问题似乎是输入文件.我想读取的矩阵实际上以python pickle
的形式存储在hdf5
文件中.因此,每次我尝试打开它并通过R
访问它时,我都会得到一个segmentation fault
.我确实弄清楚了如何将python
中的矩阵另存为tsv
文件,现在解决了该问题.
I found out a package rhdf5
which is not part of CRAN but is part of BioConductoR. The interface is relatively easier to understand the the documentation and example code is quite clear. I could use it without problems. My problem it seems was the input file. The matrix that I wanted to read was actually stored in the hdf5
file as a python pickle
. So every time I tried to open it and access it through R
i got a segmentation fault
. I did figure out how to save the matrix from within python
as a tsv
file and now that problem is solved.
推荐答案
尽管rhdf5
软件包不在CRAN中,但它确实可以很好地工作.从生物导体
The rhdf5
package works really well, although it is not in CRAN. Install it from Bioconductor
# as of 2020-09-08, these are the updated instructions per
# https://bioconductor.org/install/
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.11")
并使用它:
library(rhdf5)
列出文件中的对象以查找要读取的数据组:
List the objects within the file to find the data group you want to read:
h5ls("path/to/file.h5")
读取HDF5数据:
mydata <- h5read("path/to/file.h5", "/mygroup/mydata")
检查结构:
str(mydata)
(请注意,多维数组可能会转置).您还可以阅读组,这些组将在R中命名为列表.
(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.
这篇关于如何处理R中的hdf5文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!