将BCD的大型numpy数组转换为十进制 [英] Convert large numpy arrays of BCD to decimal
问题描述
我有多个GB范围内的二进制数据文件,我使用numpy进行内存映射.每个数据包的开头都包含一个BCD时间戳.每个十六进制数字都编码为0DDD:HH:MM:SS.sssss的时间格式,我需要将此时间戳转换为当年的总秒数.
I have binary data files in the multiple GB range that I am memory mapping with numpy. The start of each data packet contains a BCD timestamp. Where each hex number is coded into the time format of 0DDD:HH:MM:SS.ssss I need this timestamp turned into total seconds of the current year.
示例:
第一次时间戳0x0261 1511 2604 6002
将是:261:15:11:26.046002或
Example:
The the first time stamp 0x0261 1511 2604 6002
Would be: 261:15:11:26.046002 or
261*86400 + 15*3600 + 11*60 + 26.046002 = 22551986.046002
当前,我正在执行此操作以计算时间戳:
Currently I am doing this to compute the timestamps:
import numpy as np
rawData = np.memmap('dataFile.bin',dtype='u1',mode='r')
#findFrameStart returns the index to the start of each data packet [0,384,768,...]
fidx = findFrameStart(rawData)
# Do lots of bit shifting and multiplying and type casting....
day1 = ((rawData[fidx ]>>4)*10 + (rawData[fidx ]&0x0F)).astype('f8')
day2 = ((rawData[fidx+1]>>4)*10 + (rawData[fidx+1]&0x0F)).astype('f8')
hour = ((rawData[fidx+2]>>4)*10 + (rawData[fidx+2]&0x0F)).astype('f8')
mins = ((rawData[fidx+3]>>4)*10 + (rawData[fidx+3]&0x0F)).astype('f8')
sec1 = ((rawData[fidx+4]>>4)*10 + (rawData[fidx+4]&0x0F)).astype('f8')
sec2 = ((rawData[fidx+5]>>4)*10 + (rawData[fidx+5]&0x0F)).astype('f8')
sec3 = ((rawData[fidx+6]>>4)*10 + (rawData[fidx+6]&0x0F)).astype('f8')
sec4 = ((rawData[fidx+7]>>4)*10 + (rawData[fidx+7]&0x0F)).astype('f8')
time = (day1*100+day2)*86400 + hour*3600 + mins*60 + sec1 + sec2/100 + sec3/10000 + sec4/1000000
请注意,我必须将每个中间变量(day1,day2等)强制转换为两倍,以使time
能够正确计算.
Note I had to cast each of the intermediate vars (day1, day2, etc.) to double to get the time
to compute correctly.
鉴于框架很多,fidx
可能会变得很大(〜10e6个元素或更多).在我当前的方法中,这导致大量的数学运算,移位,转换等.到目前为止,它在较小的测试文件上正常运行(在150MB数据文件上约为180ms).但是,我担心当我遇到一些较大的数据(4-5GB)时,所有中间阵列都可能存在内存问题.
Given that there are lots of frames, fidx
can get kind of large (~10e6 elements or more). This results in lots of math operations, bit shifts, casting, etc. in my current method. So far it is working OK on a smaller test file (~180ms on a 150MB data file). However, I am worried about when I hit some larger data(4-5GB) there might be memory issues with all of the intermediate arrays.
因此,如果可能的话,我正在寻找一种可能会缩短一些开销的不同方法.从BCD到十进制的每个字节操作都是相似的,因此看来我应该应该可以对某些内容进行迭代,并可以将数组转换为适当的位置……至少可以减少内存占用.
So if possible I was looking for a different method that might shortcut some of the overhead. The BCD to decimal operations are similar for each byte so it seems I should maybe be able to iterate over something and maybe convert an array in place ... at least reducing the memory footprint.
任何帮助将不胜感激.仅供参考,我正在使用Python 3.7
Any help would be appreciated. FYI, I am using Python 3.7
推荐答案
我对代码进行了以下调整.这将修改time
数组的位置&消除了对所有中间阵列的需要.我尚未计时结果,但它需要较少的内存.
I made the following adjustments to my code. This modifies the time
array in place & removed the need for all of the intermediate arrays. I haven't timed the result but it should require less memory.
time = np.zeros(fidx.shape,dtype='f8')
scale = np.array([8640000, 86400, 3600, 60, 1, .01, .0001, .000001],dtype='f8')
for ii,sf in enumerate(scale):
time = time + ((rawData[fidx+ii]>>4)*10 + (rawData[fidx+ii]&0x0F))*sf
这篇关于将BCD的大型numpy数组转换为十进制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!