将BCD的大型numpy数组转换为十进制 [英] Convert large numpy arrays of BCD to decimal

查看:85
本文介绍了将BCD的大型numpy数组转换为十进制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个GB范围内的二进制数据文件,我使用numpy进行内存映射.每个数据包的开头都包含一个BCD时间戳.每个十六进制数字都编码为0DDD:HH:MM:SS.sssss的时间格式,我需要将此时间戳转换为当年的总秒数.

I have binary data files in the multiple GB range that I am memory mapping with numpy. The start of each data packet contains a BCD timestamp. Where each hex number is coded into the time format of 0DDD:HH:MM:SS.ssss I need this timestamp turned into total seconds of the current year.

示例: 第一次时间戳0x0261 1511 2604 6002将是:261:15:11:26.046002或

Example: The the first time stamp 0x0261 1511 2604 6002 Would be: 261:15:11:26.046002 or

261*86400 + 15*3600 + 11*60 + 26.046002 =  22551986.046002

当前,我正在执行此操作以计算时间戳:

Currently I am doing this to compute the timestamps:

import numpy as np
rawData  = np.memmap('dataFile.bin',dtype='u1',mode='r') 
#findFrameStart returns the index to the start of each data packet   [0,384,768,...]
fidx = findFrameStart(rawData)

# Do lots of bit shifting and multiplying and type casting....
day1  = ((rawData[fidx  ]>>4)*10 + (rawData[fidx  ]&0x0F)).astype('f8')
day2  = ((rawData[fidx+1]>>4)*10 + (rawData[fidx+1]&0x0F)).astype('f8')
hour  = ((rawData[fidx+2]>>4)*10 + (rawData[fidx+2]&0x0F)).astype('f8')
mins  = ((rawData[fidx+3]>>4)*10 + (rawData[fidx+3]&0x0F)).astype('f8')
sec1  = ((rawData[fidx+4]>>4)*10 + (rawData[fidx+4]&0x0F)).astype('f8')
sec2  = ((rawData[fidx+5]>>4)*10 + (rawData[fidx+5]&0x0F)).astype('f8')
sec3  = ((rawData[fidx+6]>>4)*10 + (rawData[fidx+6]&0x0F)).astype('f8')
sec4  = ((rawData[fidx+7]>>4)*10 + (rawData[fidx+7]&0x0F)).astype('f8')
time  = (day1*100+day2)*86400 + hour*3600 + mins*60 + sec1 + sec2/100 + sec3/10000 + sec4/1000000

请注意,我必须将每个中间变量(day1,day2等)强制转换为两倍,以使time能够正确计算.

Note I had to cast each of the intermediate vars (day1, day2, etc.) to double to get the time to compute correctly.

鉴于框架很多,fidx可能会变得很大(〜10e6个元素或更多).在我当前的方法中,这导致大量的数学运算,移位,转换等.到目前为止,它在较小的测试文件上正常运行(在150MB数据文件上约为180ms).但是,我担心当我遇到一些较大的数据(4-5GB)时,所有中间阵列都可能存在内存问题.

Given that there are lots of frames, fidx can get kind of large (~10e6 elements or more). This results in lots of math operations, bit shifts, casting, etc. in my current method. So far it is working OK on a smaller test file (~180ms on a 150MB data file). However, I am worried about when I hit some larger data(4-5GB) there might be memory issues with all of the intermediate arrays.

因此,如果可能的话,我正在寻找一种可能会缩短一些开销的不同方法.从BCD到十进制的每个字节操作都是相似的,因此看来我应该应该可以对某些内容进行迭代,并可以将数组转换为适当的位置……至少可以减少内存占用.

So if possible I was looking for a different method that might shortcut some of the overhead. The BCD to decimal operations are similar for each byte so it seems I should maybe be able to iterate over something and maybe convert an array in place ... at least reducing the memory footprint.

任何帮助将不胜感激.仅供参考,我正在使用Python 3.7

Any help would be appreciated. FYI, I am using Python 3.7

推荐答案

我对代码进行了以下调整.这将修改time数组的位置&消除了对所有中间阵列的需要.我尚未计时结果,但它需要较少的内存.

I made the following adjustments to my code. This modifies the time array in place & removed the need for all of the intermediate arrays. I haven't timed the result but it should require less memory.

time = np.zeros(fidx.shape,dtype='f8')
scale = np.array([8640000, 86400, 3600, 60, 1, .01, .0001, .000001],dtype='f8')
for ii,sf in enumerate(scale):
    time = time + ((rawData[fidx+ii]>>4)*10 + (rawData[fidx+ii]&0x0F))*sf    

这篇关于将BCD的大型numpy数组转换为十进制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆