Python pandas 使用read_hdf和HDFStore.select从HDF5文件读取特定值 [英] Python pandas Reading specific values from HDF5 files using read_hdf and HDFStore.select
问题描述
所以我用一个看起来像这样的简单数据集创建了hdf5文件
So I created hdf5 file with a simple dataset that looks like this
>>> pd.read_hdf('STORAGE2.h5', 'table')
A B
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
使用此脚本
import pandas as pd
import scipy as sp
from pandas.io.pytables import Term
store = pd.HDFStore('STORAGE2.h5')
df_tl = pd.DataFrame(dict(A=list(range(5)), B=list(range(5))))
df_tl.to_hdf('STORAGE2.h5','table',append=True)
我知道我可以使用选择列
I know I can select columns using
x = pd.read_hdf('STORAGE2.h5', 'table', columns=['A'])
或
x = store.select('table', where = 'columns=A')
我将如何在"A"列中选择等于3或特定值或在"A"列中包含字符串(如"foo")的索引的所有值?在熊猫数据框中,我将使用df[df["A"]==3]
或df[df["A"]=='foo']
How would I select all values in column 'A' that equals 3 or specific or indicies with strings in column 'A' like 'foo'? In pandas dataframes I would use df[df["A"]==3]
or df[df["A"]=='foo']
如果我使用read_hdf()
或store.select()
,效率也会有所不同吗?
Also does it make a difference in efficiency if I use read_hdf()
or store.select()
?
推荐答案
您需要指定data_columns=
(也可以使用True
使所有列都可搜索)
You need to specify data_columns=
(you can use True
as well to make all columns searchable)
(仅供参考,mode='w'
将重新开始文件,仅以我为例)
(FYI, the mode='w'
will start the file over, and is just for my example)
In [50]: df_tl.to_hdf('STORAGE2.h5','table',append=True,mode='w',data_columns=['A'])
In [51]: pd.read_hdf('STORAGE2.h5','table',where='A>2')
Out[51]:
A B
3 3 3
4 4 4
这篇关于Python pandas 使用read_hdf和HDFStore.select从HDF5文件读取特定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!