pytables添加重复子类作为列 [英] pytables add repetitive subclass as column

查看:68
本文介绍了pytables添加重复子类作为列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建具有严格参数的HDF5文件.它具有1个包含变量列的表.某一时刻,列变得重复,并附加了不同的数据.显然,我不能在IsDescription类中添加循环.当前,类Segments已被添加到类Summary_data下两次.我需要打电话给segments_k 70次.最好的方法是什么?谢谢.

I am creating a HDF5 file with strict parameters. It has 1 table consisting of variable columns. At one point the columns become repetitive with the different data being appended. Apparently, I can't add loop inside IsDescription class. Currently the class Segments has been added under class Summary_data twice. I need to call segments_k 70 times. What is the best approach to it? Thank you.

class Header(IsDescription):
    _v_pos    = 1
    id        = Int16Col(dflt=1, pos = 0)
    timestamp = Int16Col(dflt=1, pos = 1)

class Segments(IsDescription):
    segment_id      = Int16Col(dflt=1, pos = 0)
    segment_quality = Float32Col(dflt=1, pos = 1)
    segment_length  = Float32Col(dflt=1, pos = 2)

class Summary_data(IsDescription):
    latency     = Float32Col(dflt=1, pos = 2)
    segments_k  = Int16Col(dflt=1, pos = 4)
    segments_k0 = Segments()
    segments_k1 = Segments()

class Everything(IsDescription):
    header       = Header()
    summary_data = Summary_data()
    
def write_new_file():
    h5file = "results.hdf5"
    with open_file(h5file, mode = "w") as f:
        root    = f.root
        table1  = f.create_table(root, "Table1", Everything)
        row     = table1.row
        length  = [[23.5, 16.3], [8, 6]]
        quality = [[0.9, 0.7], [0.6, 0.4]]
        for i in range(2):
            row['header/id'] = i
            row['header/timestamp'] = i * 2.
            row['summary_data/latency'] = 0.0
            row['summary_data/segments_k'] = 0

            for data in range(2):
                row['summary_data/segments_k'+str(data)+'/segment_id'] = data
                row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
                row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
            row.append()

推荐答案

好,我想我理解了,并将尝试解释我是如何做到的(以及如何扩展以处理所有70个细分).顺便说一句,您嵌套的字段非常复杂,远比我所见过的要复杂得多.您确定需要这么多级别的嵌套字段吗?

Ok, I think I understand, and will attempt to explain how I did this (and how to extend to handle all 70 segments). As an aside, your nested fields are exceedingly complex, far more complicated than anything I've seen. Are you sure you need this many levels of nested fields?

键正在使用 np.dtype()定义表描述.我总是使用它们来定义我的表,而不是使用 IsDescription 方法.(我使用NumPy来处理我的HDF5数据,因此我对该模块感到满意.)在您的情况下,您需要进行dytpe,因为这是我知道用代码创建复杂表结构的唯一方法.否则,您将创建整个小时的 IsDescription 整体.:-)

The key is using a np.dtype() to define the table description. I always use them to define my tables, not the IsDescription method. (I use NumPy to process my HDF5 data, so I'm comfortable with the module.) In your case, you need a dytpe because it is the only way I know to create your complex table structure with code. Otherwise you will be creating IsDescription entires for hours. :-)

下面的代码使用3种不同的方法来创建3个表(每个表中的模式和数据应相同).每种的解释:

The code below uses 3 different methods to create 3 tables (schema and data in each table should be identical). An explanation for each:

  1. 表1 :是使用您的代码创建的.它使用 IsDescription 方法创建3 summary_data/segments_k#条目.(我在 class Summary_data()中添加了 segments_k2 = Segments()).请注意以下代码行: print(tb.description.dtype_from_descr(Everything)).它打印Table1使用的 Everything 描述的等效np.dtype.我在下面的表2和表3中引用了这一点.
  2. 表2 描述参考了np.dtype tb2_dt .我复制/粘贴这是从先前的输出中得出的.我本可以引用为变量,但我希望您看到它以了解我对表3所做的操作.填充表的代码与表1相同.
  3. 表3 描述引用了np.dtype tb3_dt .这是事情变得棘手的地方.np.dtype结构是复杂的:它是一个元组列表和列表元组.dtype由 seg_kn_list tb3_dt_list 构建.填充表的代码与表1和2相同.
  1. Table 1: is created with your code. It uses theIsDescription method to create 3 summary_data/segments_k# entries. (I added segments_k2 = Segments() to class Summary_data() ). Note this line of code: print (tb.description.dtype_from_descr(Everything) ). It prints the equivalent np.dtype for Everything description used by Table1. I referenced this for Tables 2 and 3 below.
  2. Table 2 description references np.dtype tb2_dt. I copied/pasted this from the previous output. I could have referenced as a variable, but I want you to see it to understand what I did for Table 3. Code to populate the table is the same as Table 1.
  3. Table 3 description references np.dtype tb3_dt. This is where it things get tricky. The np.dtype structure is COMPLICATED: it is a list of tuples and tuples of lists. The dtype is built from seg_kn_list and tb3_dt_list. Code to populate the table is the same as Table 1 and 2.

要使其适用于70个细分受众群,全部"您要做的就是更改2个 range(3)参数,这些参数创建 seg_kn_tlist 并填充数据行.(当然,您还需要提供数据.)

To get this to work for 70 segments, "all" you have to do is change the 2 range(3) arguments that create seg_kn_tlist and populate the data rows. (Of course, you also need to provide the data.)

下面的代码:

    import tables as tb
    import numpy as np

    h5file = "SO_64449277np.h5"
    with tb.open_file(h5file, mode = "w") as h5f:
        length  = [[23.5, 16.3], [8, 6], [11.0, 7.7]]
        quality = [[0.9, 0.7], [0.6, 0.4], [0.8, 0.5]]

        root    = h5f.root
        table1  = h5f.create_table(root, "Table1", Everything)
        print (tb.description.dtype_from_descr(Everything) )

        row     = table1.row
        for i in range(2):
            row['header/id'] = i
            row['header/timestamp'] = i * 2.
            row['summary_data/latency'] = 0.0
            row['summary_data/segments_k'] = 0

            for data in range(3):
                row['summary_data/segments_k'+str(data)+'/segment_id'] = data
                row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
                row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
            row.append()

        tb2_dt = np.dtype([('header', [('id', '<i2'), ('timestamp', '<i2')]), 
                           ('summary_data', [('latency', '<f4'), ('segments_k', '<i2'), 
                           ('segments_k0', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]), 
                           ('segments_k1', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]),
                           ('segments_k2', [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')]),
                           ])] )

        table2  = h5f.create_table(root, "Table2", tb2_dt)
        row     = table2.row
        for i in range(2):
            row['header/id'] = i
            row['header/timestamp'] = i * 2.
            row['summary_data/latency'] = 0.0
            row['summary_data/segments_k'] = 0

            for data in range(3):
                row['summary_data/segments_k'+str(data)+'/segment_id'] = data
                row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
                row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
            row.append()

# Create np.dtype() iteratively
# Start with laency and segments_k, and use a loop to add segments_k# id, quality and length
            
        seg_kn_tlist = [('latency', '<f4'), ('segments_k', '<i2') ]
        for cnt in range(3) :            
            seg_kn_tlist.append( ('segments_k'+str(cnt), 
                                [('segment_id', '<i2'), ('segment_quality', '<f4'), ('segment_length', '<f4')] ) ) 
 
# Finish np.dtype() definition with fileds for header, timestamp and summary_data, followed by tuple with list above         
        tb3_dt_list = [ ('header', [('id', '<i2'), ('timestamp', '<i2')]), ('summary_data', seg_kn_tlist) ]
        
        tb3_dt = np.dtype( tb3_dt_list ) 

        table3  = h5f.create_table(root, "Table3", tb3_dt)
        row     = table3.row
        for i in range(2):
            row['header/id'] = i
            row['header/timestamp'] = i * 2.
            row['summary_data/latency'] = 0.0
            row['summary_data/segments_k'] = 0

            for data in range(3):
                row['summary_data/segments_k'+str(data)+'/segment_id'] = data
                row['summary_data/segments_k'+str(data)+'/segment_quality'] = quality[data][i]
                row['summary_data/segments_k'+str(data)+'/segment_length'] = length[data][i]
            row.append()

这篇关于pytables添加重复子类作为列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆