如何在R中将以HDF5文件格式保存在 pandas 中的数据框加载? [英] How can I load a data frame saved in pandas as an HDF5 file in R?

查看:384
本文介绍了如何在R中将以HDF5文件格式保存在 pandas 中的数据框加载?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将熊猫的数据框保存在HDF5文件中:

I saved a data frame in pandas in an HDF5 file:

import numpy as np
import pandas as pd
np.random.seed(1)
frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), 
                     index=['Utah', 'Ohio', 'Texas', 'Oregon'])
print('frame: {0}'.format(frame))
store = pd.HDFStore('file.h5')
store['df'] =  frame
store.close()

框架如下:

frame:                b         d         e
Utah              1.624345 -0.611756 -0.528172
Ohio             -1.072969  0.865408 -2.301539
Texas             1.744812 -0.761207  0.319039
Oregon           -0.249370  1.462108 -2.060141

我正在尝试将其加载到R中:

I am trying to load it in R:

#source("http://bioconductor.org/biocLite.R")
#biocLite("rhdf5")    
library(rhdf5)
frame = h5ls("file.h5")    
frame

但是,一旦加载到R中,它看起来将如下所示:

However, once loaded in R it looks as follows:

> frame
  group          name       otype dclass   dim
0     /            df   H5I_GROUP             
1   /df         axis0 H5I_DATASET STRING     3
2   /df         axis1 H5I_DATASET STRING     4
3   /df  block0_items H5I_DATASET STRING     3
4   /df block0_values H5I_DATASET  FLOAT 3 x 4
> 

我也尝试过:

frame2 = h5read("file.h5", '/df')
frame2

但是它返回几个值,但没有数据帧:

However it returns several values but no data frame:

> frame2
$axis0
[1] "b" "d" "e"

$axis1
[1] "Utah"   "Ohio"   "Texas"  "Oregon"

$block0_items
[1] "b" "d" "e"

$block0_values
           [,1]       [,2]       [,3]       [,4]
[1,]  1.6243454 -1.0729686  1.7448118 -0.2493704
[2,] -0.6117564  0.8654076 -0.7612069  1.4621079
[3,] -0.5281718 -2.3015387  0.3190391 -2.0601407

如何在R中加载以HDF5文件格式保存在熊猫中的数据框?

How can I load a data frame saved in pandas as an HDF5 file, in R?

推荐答案

来自 https: //github.com/pandas-dev/pandas/issues/9636 (感谢 John Galt 将我指向该资源):

From https://github.com/pandas-dev/pandas/issues/9636 (thanks John Galt for pointing me to this resource):

R的HDF5导出示例

Example of HDF5 export for R

import numpy as np
import pandas as pd

np.random.seed(1)
df = pd.DataFrame({"first": np.random.rand(100),
                   "second": np.random.rand(100),
                   "class": np.random.randint(0, 2, (100,))},
                   index=range(100))

print(df.head())

store = pd.HDFStore("transfer.hdf5", "w", complib=str("zlib"), complevel=5)
store.put("dataframe", df, data_columns=df.columns)
store.close()

输出:

   class     first    second
0      0  0.417022  0.326645
1      0  0.720324  0.527058
2      1  0.000114  0.885942
3      1  0.302333  0.357270
4      1  0.146756  0.908535

在R中:

# Load values and column names for all datasets from corresponding nodes and
# insert them into one data.frame object.

library(rhdf5)

loadhdf5data <- function(h5File) {

listing <- h5ls(h5File)
# Find all data nodes, values are stored in *_values and corresponding column
# titles in *_items
data_nodes <- grep("_values", listing$name)
name_nodes <- grep("_items", listing$name)

data_paths = paste(listing$group[data_nodes], listing$name[data_nodes], sep = "/")
name_paths = paste(listing$group[name_nodes], listing$name[name_nodes], sep = "/")

columns = list()
for (idx in seq(data_paths)) {
  data <- data.frame(t(h5read(h5File, data_paths[idx])))
  names <- t(h5read(h5File, name_paths[idx]))
  entry <- data.frame(data)
  colnames(entry) <- names
  columns <- append(columns, entry)
}

data <- data.frame(columns)

return(data)
}

现在您可以导入数据框:

Now you can import the DataFrame:

> data = loadhdf5data("transfer.hdf5")
> head(data)
         first    second class
1 0.4170220047 0.3266449     0
2 0.7203244934 0.5270581     0
3 0.0001143748 0.8859421     1
4 0.3023325726 0.3572698     1
5 0.1467558908 0.9085352     1
6 0.0923385948 0.6233601     1

这篇关于如何在R中将以HDF5文件格式保存在 pandas 中的数据框加载?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆