Python pandas 使用read_hdf和HDFStore.select从HDF5文件读取特定值 [英] Python pandas Reading specific values from HDF5 files using read_hdf and HDFStore.select

查看:788
本文介绍了Python pandas 使用read_hdf和HDFStore.select从HDF5文件读取特定值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我用一个看起来像这样的简单数据集创建了hdf5文件

So I created hdf5 file with a simple dataset that looks like this

>>> pd.read_hdf('STORAGE2.h5', 'table')
   A  B
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4

使用此脚本

import pandas as pd
import scipy as sp
from pandas.io.pytables import Term

store = pd.HDFStore('STORAGE2.h5')

df_tl = pd.DataFrame(dict(A=list(range(5)), B=list(range(5))))

df_tl.to_hdf('STORAGE2.h5','table',append=True)

我知道我可以使用选择列

I know I can select columns using

x = pd.read_hdf('STORAGE2.h5', 'table',  columns=['A'])

x = store.select('table', where = 'columns=A')

我将如何在"A"列中选择等于3或特定值或在"A"列中包含字符串(如"foo")的索引的所有值?在熊猫数据框中,我将使用df[df["A"]==3]df[df["A"]=='foo']

How would I select all values in column 'A' that equals 3 or specific or indicies with strings in column 'A' like 'foo'? In pandas dataframes I would use df[df["A"]==3] or df[df["A"]=='foo']

如果我使用read_hdf()store.select(),效率也会有所不同吗?

Also does it make a difference in efficiency if I use read_hdf() or store.select()?

推荐答案

您需要指定data_columns=(也可以使用True使所有列都可搜索)

You need to specify data_columns= (you can use True as well to make all columns searchable)

(仅供参考,mode='w'将重新开始文件,仅以我为例)

(FYI, the mode='w' will start the file over, and is just for my example)

In [50]: df_tl.to_hdf('STORAGE2.h5','table',append=True,mode='w',data_columns=['A'])

In [51]: pd.read_hdf('STORAGE2.h5','table',where='A>2')
Out[51]: 
   A  B
3  3  3
4  4  4

这篇关于Python pandas 使用read_hdf和HDFStore.select从HDF5文件读取特定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆