pytables添加重复子类作为列 [英] pytables add repetitive subclass as column
问题描述
我正在创建具有严格参数的HDF5文件.它具有1个包含变量列的表.某一时刻,列变得重复,并附加了不同的数据.显然,我不能在IsDescription类中添加循环.当前,类Segments已被添加到类Summary_data下两次.我需要打电话给segments_k 70次.最好的方法是什么?谢谢.
I am creating a HDF5 file with strict parameters. It has 1 table consisting of variable columns. At one point the columns become repetitive with the different data being appended. Apparently, I can't add loop inside IsDescription class. Currently the class Segments has been added under class Summary_data twice. I need to call segments_k 70 times. What is the best approach to it? Thank you.
class Header(IsDescription):
_v_pos = 1
id = Int16Col(dflt=1, pos = 0)
timestamp = Int16Col(dflt=1, pos = 1)
class Segments(IsDescription):
segment_id = Int16Col(dflt=1, pos = 0)
segment_quality = Float32Col(dflt=1, pos = 1)
segment_length = Float32Col(dflt=1, pos = 2)
class Summary_data(IsDescription):
latency = Float32Col(dflt=1, pos = 2)
segments_k = Int16Col(dflt=1, pos = 4)
segments_k0 = Segments()
segments_k1 = Segments()
class Everything(IsDescription):
header = Header()
summary_data = Summary_data()
def write_new_file():
h5file = "results.hdf5"
with open_file(h5file, mode = "w") as f:
root = f.root
table1 = f.create_table(root, "Table1", Everything)
row = table1.row
length = [[23.5, 16.3], [8, 6]]
quality = [[0.9, 0.7], [0.6, 0.4]]
for i in range(2):
row['header/id'] = i
row['header/timestamp'] = i * 2.
row['summary_data/latency'] = 0.0
row['summary_data/segments_k'] = 0
for data in range(2):
row['summary_data/segments_k'+str(data)+'/segment_id'] = data
row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
row.append()
推荐答案
好,我想我理解了,并将尝试解释我是如何做到的(以及如何扩展以处理所有70个细分).顺便说一句,您嵌套的字段非常复杂,远比我所见过的要复杂得多.您确定需要这么多级别的嵌套字段吗?
Ok, I think I understand, and will attempt to explain how I did this (and how to extend to handle all 70 segments). As an aside, your nested fields are exceedingly complex, far more complicated than anything I've seen. Are you sure you need this many levels of nested fields?
键正在使用 np.dtype()
定义表描述.我总是使用它们来定义我的表,而不是使用 IsDescription
方法.(我使用NumPy来处理我的HDF5数据,因此我对该模块感到满意.)在您的情况下,您需要进行dytpe,因为这是我知道用代码创建复杂表结构的唯一方法.否则,您将创建整个小时的 IsDescription
整体.:-)
The key is using a np.dtype()
to define the table description. I always use them to define my tables, not the IsDescription
method. (I use NumPy to process my HDF5 data, so I'm comfortable with the module.) In your case, you need a dytpe because it is the only way I know to create your complex table structure with code. Otherwise you will be creating IsDescription
entires for hours. :-)
下面的代码使用3种不同的方法来创建3个表(每个表中的模式和数据应相同).每种的解释:
The code below uses 3 different methods to create 3 tables (schema and data in each table should be identical). An explanation for each:
- 表1 :是使用您的代码创建的.它使用
IsDescription
方法创建3summary_data/segments_k#
条目.(我在class Summary_data()
中添加了segments_k2 = Segments()
).请注意以下代码行:print(tb.description.dtype_from_descr(Everything))
.它打印Table1使用的Everything
描述的等效np.dtype.我在下面的表2和表3中引用了这一点. - 表2 描述参考了np.dtype
tb2_dt
.我复制/粘贴这是从先前的输出中得出的.我本可以引用为变量,但我希望您看到它以了解我对表3所做的操作.填充表的代码与表1相同. - 表3 描述引用了np.dtype
tb3_dt
.这是事情变得棘手的地方.np.dtype结构是复杂的:它是一个元组列表和列表元组.dtype由seg_kn_list
和tb3_dt_list
构建.填充表的代码与表1和2相同.
- Table 1: is created with your code. It uses the
IsDescription
method to create 3summary_data/segments_k#
entries. (I addedsegments_k2 = Segments()
toclass Summary_data()
). Note this line of code:print (tb.description.dtype_from_descr(Everything) )
. It prints the equivalent np.dtype forEverything
description used by Table1. I referenced this for Tables 2 and 3 below. - Table 2 description references np.dtype
tb2_dt
. I copied/pasted this from the previous output. I could have referenced as a variable, but I want you to see it to understand what I did for Table 3. Code to populate the table is the same as Table 1. - Table 3 description references np.dtype
tb3_dt
. This is where it things get tricky. The np.dtype structure is COMPLICATED: it is a list of tuples and tuples of lists. The dtype is built fromseg_kn_list
andtb3_dt_list
. Code to populate the table is the same as Table 1 and 2.
要使其适用于70个细分受众群,全部"您要做的就是更改2个 range(3)
参数,这些参数创建 seg_kn_tlist
并填充数据行.(当然,您还需要提供数据.)
To get this to work for 70 segments, "all" you have to do is change the 2 range(3)
arguments that create seg_kn_tlist
and populate the data rows. (Of course, you also need to provide the data.)
下面的代码:
import tables as tb
import numpy as np
h5file = "SO_64449277np.h5"
with tb.open_file(h5file, mode = "w") as h5f:
length = [[23.5, 16.3], [8, 6], [11.0, 7.7]]
quality = [[0.9, 0.7], [0.6, 0.4], [0.8, 0.5]]
root = h5f.root
table1 = h5f.create_table(root, "Table1", Everything)
print (tb.description.dtype_from_descr(Everything) )
row = table1.row
for i in range(2):
row['header/id'] = i
row['header/timestamp'] = i * 2.
row['summary_data/latency'] = 0.0
row['summary_data/segments_k'] = 0
for data in range(3):
row['summary_data/segments_k'+str(data)+'/segment_id'] = data
row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
row.append()
tb2_dt = np.dtype([('header', [('id', '<i2'), ('timestamp', '<i2')]),
('summary_data', [('latency', '<f4'), ('segments_k', '<i2'),
('segments_k0', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]),
('segments_k1', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]),
('segments_k2', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]),
])] )
table2 = h5f.create_table(root, "Table2", tb2_dt)
row = table2.row
for i in range(2):
row['header/id'] = i
row['header/timestamp'] = i * 2.
row['summary_data/latency'] = 0.0
row['summary_data/segments_k'] = 0
for data in range(3):
row['summary_data/segments_k'+str(data)+'/segment_id'] = data
row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
row.append()
# Create np.dtype() iteratively
# Start with laency and segments_k, and use a loop to add segments_k# id, quality and length
seg_kn_tlist = [('latency', '<f4'), ('segments_k', '<i2') ]
for cnt in range(3) :
seg_kn_tlist.append( ('segments_k'+str(cnt),
[('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')] ) )
# Finish np.dtype() definition with fileds for header, timestamp and summary_data, followed by tuple with list above
tb3_dt_list = [ ('header', [('id', '<i2'), ('timestamp', '<i2')]), ('summary_data', seg_kn_tlist) ]
tb3_dt = np.dtype( tb3_dt_list )
table3 = h5f.create_table(root, "Table3", tb3_dt)
row = table3.row
for i in range(2):
row['header/id'] = i
row['header/timestamp'] = i * 2.
row['summary_data/latency'] = 0.0
row['summary_data/segments_k'] = 0
for data in range(3):
row['summary_data/segments_k'+str(data)+'/segment_id'] = data
row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
row.append()
这篇关于pytables添加重复子类作为列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!