写从字典numpy的数组 [英] Writing to numpy array from dictionary

查看:1525
本文介绍了写从字典numpy的数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有文件头中的值(时间,框架,年,月,等的数量),我想编写成numpy的阵列的字典。在code我目前如下:

I have a dictionary of file header values (time, number of frames, year, month, etc) that I would like to write into a numpy array. The code I have currently is as follows:

    arr=np.array([(k,)+v for k,v in fileheader.iteritems()],dtype=["a3,a,i4,i4,i4,i4,f8,i4,i4,i4,i4,i4,i4,a10,a26,a33,a235,i4,i4,i4,i4,i4,i4"])

不过,我得到一个错误,只能串联元组(而不是INT),以元组。

But I get an error, "can only concatenate tuple (not "int") to tuple.

基本上,最终结果需要阵列存储整个文件的头信息(也就是512字节)和各帧的数据(报头和数据,49408个字节对于每一帧)。是否有更简单的方法来做到这一点?

Basically, the end result needs to be arrays storing the overall file header info (which is 512 bytes) and each frame's data (header and data, 49408 bytes for each frame). Is there an easier way to do this?

编辑:为了澄清(我自己也一样),我需要从文件中的每一帧到一个数组中的数据写入。我得到了MATLAB code的位置。这里是给我的code的一个粗略的想法:

To clarify (for myself as well), I need to write in the data from each frame of the file to an array. I was given matlab code as a base. Here's a rough idea of the code given to me:

data.frame=zeros([512 96])
frame=uint8(fread(fid,[data.numbeams,512]),'uint8'))
data.frame=frame

如何翻译的帧到Python?

How do I translate the "frame" into python?

推荐答案

你可能会更好过只是保持头数据字典。你真的需要它作为一个数组? (如果是,为什么?有有在numpy的阵列头部的某些优点,但它不是一个简单的字典更复杂,而且不灵活。)

You're probably better off just keeping the header data in dict. Do you really need it as an array? (If so, why? There are some advantages of having the header in a numpy array, but it's more complex than a simple dict, and isn't as flexible.)

一个缺点到字典是没有predictable以它的钥匙。如果你需要写你的头回磁盘的正常秩序(类似于C结构),那么你需要单独存储字段的顺序,以及它们的值。如果是这样的话,你可能会考虑有序字典( collections.OrderedDict ),或者只是把组成一个简单的类来保存你的标题数据和存储的顺序出现。

One drawback to a dict is that there's no predictable order to its keys. If you need to write your header back to disk in a regular order (similar to a C struct), then you need to separately store the order of the fields, as well as their values. If that's the case, you might consider an ordered dict (collections.OrderedDict) or just putting together a simple class to hold your header data and storing the order there.

除非有一个很好的理由把它变成一个numpy的数组,你可能不希望。

Unless there's a good reason to put it into an numpy array, you may not want to.

然而,一个结构化的阵列将preserve你的头的顺序,将使其更容易编写它的二进制重新presentation到磁盘,但它在其他方面缺乏灵活性。

However, a structured array will preserve the order of your header and will make it easier to write a binary representation of it to disk, but it's inflexible in other ways.

如果你是想使头一个数组,你会做这样的事情:

If you did want to make the header an array, you'd do something like this:

import numpy as np

# Lists can be modified, but preserve order. That's important in this case.
names = ['Name1', 'Name2', 'Name3']
# It's "S3" instead of "a3" for a string field in numpy, by the way
formats = ['S3', 'i4', 'f8'] 

# It's often cleaner to specify the dtype this way instead of as a giant string
dtype = dict(names=names, formats=formats)

# This won't preserve the order we're specifying things in!!
# If we iterate through it, things may be in any order.
header = dict(Name1='abc', Name2=456, Name3=3.45)

# Therefore, we'll be sure to pass things in in order...
# Also, np.array will expect a tuple instead of a list for a structured array...
values = tuple(header[name] for name in names)
header_array = np.array(values, dtype=dtype)

# We can access field in the array like this...
print header_array['Name2']

# And dump it to disk (similar to a C struct) with
header_array.tofile('test.dat')

在另一方面,如果你只是想获得在标题中的值,只是把它作为一个字典。这是简单的方式。

On the other hand, if you just want access to the values in the header, just keep it as a dict. It's simpler that way.

根据这听起来像你这样做,我会做这样的事情。我使用numpy的阵列在头读取,但标头值实际被存储为类属性(以及首标数组)。

Based on what it sounds like you're doing, I'd do something like this. I'm using numpy arrays to read in the header, but the header values are actually being stored as class attributes (as well as the header array).

这看起来复杂得多,它实际上是。

This looks more complicated than it actually is.

我只是定义两个新类,一个是父文件,一个用于框架。你可以做同样的事情少了几分code,但此为您提供更复杂的事情了基础。

I'm just defining two new classes, one for the parent file and one for a frame. You could do the same thing with a bit less code, but this gives you a foundation for more complex things.

import numpy as np

class SonarFile(object):
    # These define the format of the file header
    header_fields = ('num_frames', 'name1', 'name2', 'name3')
    header_formats = ('i4', 'f4', 'S10', '>I4')

    def __init__(self, filename):
        self.infile = open(filename, 'r')
        dtype = dict(names=self.header_fields, formats=self.header_formats)

        # Read in the header as a numpy array (count=1 is important here!)
        self.header = np.fromfile(self.infile, dtype=dtype, count=1)

        # Store the position so we can "rewind" to the end of the header
        self.header_length = self.infile.tell()

        # You may or may not want to do this (If the field names can have
        # spaces, it's a bad idea). It will allow you to access things with
        # sonar_file.Name1 instead of sonar_file.header['Name1'], though.
        for field in self.header_fields:
            setattr(self, field, self.header[field])

    # __iter__ is a special function that defines what should happen when we  
    # try to iterate through an instance of this class.
    def __iter__(self):
        """Iterate through each frame in the dataset."""
        # Rewind to the end of the file header
        self.infile.seek(self.header_length)

        # Iterate through frames...
        for _ in range(self.num_frames):
            yield Frame(self.infile)

    def close(self):
        self.infile.close()

class Frame(object):
    header_fields = ('width', 'height', 'name')
    header_formats = ('i4', 'i4', 'S20')
    data_format = 'f4'

    def __init__(self, infile):
        dtype = dict(names=self.header_fields, formats=self.header_formats)
        self.header = np.fromfile(infile, dtype=dtype, count=1)

        # See discussion above...
        for field in self.header_fields:
            setattr(self, field, self.header[field])

        # I'm assuming that the size of the frame is in the frame header...
        ncols, nrows = self.width, self.height

        # Read the data in
        self.data = np.fromfile(infile, self.data_format, count=ncols * nrows)

        # And reshape it into a 2d array.
        # I'm assuming C-order, instead of Fortran order.
        # If it's fortran order, just do "data.reshape((ncols, nrows)).T"
        self.data = self.data.reshape((nrows, ncols))

您会使用它类似于这样:

You'd use it similar to this:

dataset = SonarFile('input.dat')

for frame in dataset:
    im = frame.data
    # Do something...

这篇关于写从字典numpy的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆