astropy.io适合一大桌子的有效元素访问 [英] astropy.io fits efficient element access of a large table

查看:221
本文介绍了astropy.io适合一大桌子的有效元素访问的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从二进制表中提取的数据在使用Python和astropy.io FITS文件。该表包含一个事件阵列,超过200万的事件。我想要做的是存储在数组中的特定事件的时间值,这样的话我可以做分析认为阵列上。我的问题是这样的,而在FORTRAN(使用FITSIO)相同的操作,也许需要几秒钟就慢得多的处理器,使用Python中astropy.io完全相同的操作需要几分钟时间。我想知道确切位置的瓶颈是,如果有访问顺序的各个元素的更有效的方法来确定是否向每个时间值存储新数组中为止。这里是code我到目前为止有:

I am trying to extract data from a binary table in a FITS file using Python and astropy.io. The table contains an events array with over 2 million events. What I want to do is store the TIME values of certain events in an array, so I can then do analysis on that array. The problem I have is that, whereas in fortran (using FITSIO) the same operation takes maybe a couple of seconds on a much slower processor, the exact same operation in Python using astropy.io is taking several minutes. I would like to know where exactly the bottleneck is, and if there is a more efficient way to access the individual elements in order to determine whether or not to store each time value in the new array. Here is the code I have so far:

from astropy.io import fits

minenergy=0.3
maxenergy=0.4
xcen=20000
ycen=20000
radius=50

datafile=fits.open('datafile.fits')
events=datafile['EVENTS'].data


datafile.close()

times=[]

for i in range(len(events)):
    energy=events['PI'][i]
    if energy<maxenergy*1000:
        if energy>minenergy*1000:
            x=events['X'][i]
            y=events['Y'][i]
            radius2=(x-xcen)*(x-xcen)+(y-ycen)*(y-ycen)
            if radius2<=radius*radius:
                times.append(events['TIME'][i])

print times

任何帮助将是AP preciated。我在其他语言的一个确定的程序员,但我还没有真正到了之前在Python担心效率。我选择这样做在Python的原因是现在,我用既FITSIO和PGPLOT FORTRAN,以及从数字食谱一些套路,但新望Fortran编译器我有这台机器上无法劝产生正常工作程序(有32位与64位,等等一些问题)。蟒蛇似乎有我需要的(FITS I / O,绘图​​,等等)的所有功能,但如果它永远访问单个元素在列表中,我将不得不寻找另一种解决方案。

Any help would be appreciated. I am an ok programmer in other languages, but I have not really had to worry about efficiency in Python before. The reason I have chosen to do this in Python now is that I was using fortran with both FITSIO and PGPLOT, as well as some routines from Numerical Recipes, but the newish fortran compiler I have on this machine cannot be persuaded to produce a properly working program (there are some issues of 32- vs. 64-bit, etc.). Python seems to have all the functionality I need (FITS I/O, plotting, etc), but if it takes forever to access the individual elements in a list, I will have to find another solution.

非常感谢。

推荐答案

您需要做到这一点使用numpy的向量运算。无需特殊工具,如numba,做大型循环像你这样做总是会在Python慢​​,因为它是一个跨preTED语言。你的程序应该看起来更像是:

You need to do this using numpy vector operations. Without special tools like numba, doing large loops like you've done will always be slow in Python because it is an interpreted language. Your program should look more like:

energy = events['PI'] / 1000.
e_ok = (energy > min_energy) & (energy < max_energy)
rad2 = (events['X'][e_ok] - xcen)**2 + (events['Y'][e_ok] - ycen)**2
r_ok = rad2 < radius**2
times = events['TIMES'][e_ok][r_ok]

这应该有性能堪比Fortran语言。您还可以过滤整个事件表,例如:

This should have performance comparable to Fortran. You can also filter the entire event table, for instance:

events_filt = events[e_ok][r_ok]
times = events_filt['TIMES']

这篇关于astropy.io适合一大桌子的有效元素访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆