我的功率谱可信吗?隆隆表情和fft之间的比较(scipy.signal和numpy.fft) [英] my power spectra are believable? a comparison between lomb-scargle and fft (scipy.signal and numpy.fft)

查看:130
本文介绍了我的功率谱可信吗?隆隆表情和fft之间的比较(scipy.signal和numpy.fft)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以指出为什么我得到非常不同的结果吗?
有许多不应出现的峰.实际上,应该只有一个峰.
我是python新手,欢迎以下有关我的代码的所有评论.

Could anyone kindly point out why I get very different results?
There are many peaks which should not appear.In fact,there should be only one peak.
I am a python newbie and all comments about my code below are welcome.

测试数据在这里.在此处输入链接描述
您可以直接wget https://clbin.com/YJkwr.
第一列是一系列光子的到达时间.光源随机发射光子. 总时间为55736s,有67398个光子. 我尝试检测某种程度的光强度周期性.
我们可以对时间进行分档,并且光强度与每个时间分档中的光子数成正比.

The test data is here.enter link description here
You can directly wget https://clbin.com/YJkwr .
Its first column are the arrival times for a series of photons.A light source emits photons randomly. The total time is 55736s and there are 67398 photons. I try to detect some kind of periodicity of light intensity.
We can bin the time and light intensity is proportional to the photon numbers in every time bin.

我尝试使用numpy.fft和scipy.signal的lomb-scargle来制作其功率谱,但结果却大不相同.

I tried numpy.fft and lomb-scargle of scipy.signal to make its power spectrum,but got very different results.

 import pylab as pl
 import numpy as np

 timepoints=np.loadtxt('timesequence',usecols=(0,),unpack=True,delimiter=",")
 binshu=50000
 interd=54425./binshu  
 t=np.histogram(timepoints,bins=binshu)[0]
 sp = np.fft.fft(t)
 freq = np.fft.fftfreq(len(t),d=interd)  
 freqnum = np.fft.fftfreq(len(t),d=interd).argsort()  
 pl.xlabel("frequency(Hz)")
 pl.plot(freq[freqnum],np.abs(sp)[freqnum])

 timepoints=np.loadtxt('timesequence',usecols=(0,),unpack=True,delimiter=",")
 binshu=50000
 intensity=np.histogram(timepoints,bins=binshu)[0].astype('float64') 
 middletime=np.histogram(timepoints,bins=binshu)[1][1:binshu+1]-np.histogram(timepoints,bins=binshu)[1][3]*0.5
 freq1=1./(timepoints.max()-timepoints.min())
 freq2=freq1*len(timepoints)
 freqs=np.linspace(freq1,freq2,1000)
 pm1=spectral.lombscargle(middletime,intensity,freqs)

 pl.xlabel("frequency(Hz)")
 pl.plot(freqs,pm1)
 pl.xlim(0.5*freq1,2*freq2)
 pl.ylim(0,250)
 pl.show()

**********************************
谢谢沃伦,谢谢.谢谢您的详细答复.

*********************************
Thank you, Warren,I appreciate it.Thank you for your detailed reply.

是的,有一个积分时间,大约是1.7秒.
单次曝光很多,每次曝光需要1.7秒
在一次曝光中,我们无法准确告知其到达时间.

You are right,there is an integration time,here it is around 1.7s.
There are many single exposures.Every exposure costs 1.7s
In one single exposure,we can not tell its arrival time precisely.

如果时间序列如下:
0 1.7 3.4 8.5 8.5

If the time series are like:
0 1.7 3.4 8.5 8.5

最后两个光子的积分时间是1.7s,而不是(8.5-3.4)s.因此,我将修改部分代码.

The integration time of the last two photons are 1.7s,not (8.5-3.4)s.So I will revise part of your code.

但是我的问题仍然存在.您可以调整几个参数,以在一定程度上获得隆隆峰中的0.024Hz峰.然后,您可以用它来指导fft中的参数.

HOWEVER my question still remains. You adjust several parameters to get the 0.024Hz peak in lomb-scargle peak to some degree. And you use this to guide your parameters in fft.

如果您不知道数字0.024,也许您可​​以使用不同的参数来获得不同的最高峰?

If you do not know the the number 0.024, maybe you may use different parameters to get a different highest peak ?

如何保证每次我们都能正确使用num_ls_freqs?您可以看到我们是否选择了不同的num_ls_freqs最高峰位移.

How to guarantee every time we can get right the num_ls_freqs ? You can see if we choose different num_ls_freqs,the highest peak shifts.

如果我有许多时序序列,每次必须指定不同的参数吗?以及如何获得它们?

If I have many timing series, every time I have to specify different parameters? And how to get them?

推荐答案

看来50000个装箱位还不够.如果更改binshu,将会看到尖峰的位置发生变化,而不仅仅是一点点变化.

It appears that 50000 bins is not enough. If you change binshu, you'll see that the locations of the spikes change, and not just a little.

在研究FFT或Lomb-Scargle之前,我发现先研究原始数据很有用.这是文件的开头和结尾:

Before diving into FFT or Lomb-Scargle, I found it useful to investigate the raw data first. Here are the beginning and end of the file:

0,1,3.77903
0,1,4.96859
0,1,1.69098
1.74101,1,4.87652
1.74101,1,5.15564
1.74101,1,2.73634
3.48202,1,3.18583
3.48202,1,4.0806
5.2229,1,1.86738
6.96394,1,7.27398
6.96394,1,3.59345
8.70496,1,4.13443
8.70496,1,2.97584
...
55731.7,1,5.74469
55731.7,1,8.24042
55733.5,1,4.43419
55733.5,1,5.02874
55735.2,1,3.94129
55735.2,1,3.54618
55736.9,1,3.99042
55736.9,1,5.6754
55736.9,1,7.22691

有成群的相同到达时间. (这是光子探测器的原始输出,还是该文件已经过某种预处理?)

There are clusters of identical arrival times. (Is this the raw output of the photon detector, or has this file been preprocessed in some way?)

因此第一步是将这些群集一次合并为一个数字,因此数据看起来像(时间点,计数),例如:

So the first step is to coalesce these clusters into a single number at a single time, so the data looks like (timepoint, count), e.g.:

0, 3
1.74101, 3
3.48202, 2
5.2229, 1
etc

以下将完成该计数:

times, inv = np.unique(timepoints, return_inverse=True)
counts = np.bincount(inv)

现在,times是一个递增的唯一时间值的数组,而counts是当时检测到的光子数. (请注意,我猜这是对数据的正确解释!)

Now times is an array of increasing unique time values, and counts is the number of photons detected at that time. (Note that I'm guessing that this is a correct interpretation of the data!)

让我们看看采样是否接近均匀.到达时间的数组是

Let's see if the sampling is anywhere near uniform. The array of interarrival times is

dt = np.diff(times)

如果采样完全均匀,则dt中的所有值都将相同.我们可以使用与timepoints相同的模式在dt中找到唯一值(及其出现的频率):

If the sampling was perfectly uniform, all the values in dt would be the same. We can use the same pattern used above on timepoints to find the unique values (and frequency of their occurrence) in dt:

dts, inv = np.unique(dt, return_inverse=True)
dt_counts = np.bincount(inv)

例如,如果我们打印dts,则它是这样的:

For example, here's what dts looks like if we print it:

[  1.7       1.7       1.7       1.7       1.74      1.74      1.74      1.74
   1.74      1.74      1.74088   1.741     1.741     1.741     1.741
   1.741     1.741     1.741     1.74101   1.74102   1.74104   1.7411
   1.7411    1.7411    1.7411    1.7411    1.742     1.742     1.742
   1.746     1.75      1.8       1.8       1.8       1.8       3.4       3.4
   3.4       3.4       3.48      3.48      3.48      3.48      3.48      3.482
   3.482     3.482     3.482     3.482     3.4821    3.483     3.483     3.49
   3.49      3.49      3.49      3.49      3.5       3.5       5.2       5.2
   5.2       5.2       5.22      5.22      5.22      5.22      5.22      5.223
   5.223     5.223     5.223     5.223     5.223     5.23      5.23      5.23
   5.3       5.3       5.3       5.3       6.9       6.9       6.9       6.9
   6.96      6.96      6.964     6.964     6.9641    7.        8.7       8.7
   8.7       8.7       8.7       8.71     10.4      10.5      12.2    ]

(表面上的重复值实际上是不同的.未显示数字的完整精度.)如果采样完全一致,则该列表中将只有一个数字.相反,我们看到的是数字簇,没有明显的主导值. (主要是1.74的倍数,这是检测器的特征吗?)

(The apparent repeated values are actually different. The full precision of the numbers is not shown.) If the sampling was perfectly uniform, there would be just one number in that list. Instead we see clusters of numbers, with no obvious dominant values. (There is a predominance of multiples of 1.74--is that a characteristic of the detector?)

基于此观察,我们将从Lomb-Scargle开始.下面的脚本包括用于计算和绘制(timescounts)数据的Lomb-Scargle周期图的代码.给lombscargle的频率从1/trange(其中trange是数据的整个时间跨度)到1/dt.min()不等.频率数为16000(脚本中的num_ls_freqs).一些反复试验表明,这大约是解决频谱问题所需的最低数量.小于此值,峰开始四处移动.不仅如此,频谱几乎没有变化.计算表明在0.0242 Hz处有一个峰值.其他峰值是该频率的谐波.

Based on that observation, we'll start with Lomb-Scargle. The script below includes code to compute and plot the Lomb-Scargle periodogram of the (times, counts) data. The frequencies given to lombscargle vary from 1/trange, where trange is the full time span of the data, to 1/dt.min(). The number of frequencies is 16000 (num_ls_freqs in the script). Some trial-and-error showed that this is roughly the minimum number needed to resolve the spectrum. Less than this, and the peaks start moving around. More than this, and there is little change in the spectrum. The calculation shows that there is a peak at 0.0242 Hz. The other peaks are harmonics of this frequency.

现在我们有了一个基本频率的估计,我们可以用它来指导在timepoints的直方图中选择bin大小,以用于FFT计算.我们将使用bin大小,使基本频率过采样8倍.(在下面的脚本中,这是m.)也就是说,我们这样做

Now that we have an estimate of the fundamental frequency, we can use it to guide the choice of the bin size in a histogram of timepoints to be used in an FFT calculation. We'll use a bin size that results in oversampling the fundamental frequency by a factor of 8. (In the script below, this is m.) That is, we do

m = 8
nbins = int(m * ls_peak_freq * trange + 0.5)
hist, bin_edges = np.histogram(timepoints, bins=nbins, density=True)

(timepoints是从文件中读取的原始时间集;如上所述,它包含许多重复的时间值.)

(timepoints is the original set of times read from the file; it includes many repeated time values, as noted above.)

然后将FFT应用于hist. (我们实际上将使用numpy.fft.rfft.)以下是使用FFT计算的频谱图:

Then we apply the FFT to hist. (We'll actually use numpy.fft.rfft.) The following is the plot of the spectrum computed using the FFT:

如预期的那样,在0.0242 Hz处出现一个峰值.

As expected, there is a peak at 0.0242 Hz.

这是脚本:

import numpy as np
from scipy.signal import lombscargle
import matplotlib.pyplot as plt


timepoints = np.loadtxt('timesequence', usecols=(0,), delimiter=",")

# Coalesce the repeated times into the `times` and `counts` arrays.
times, inv = np.unique(timepoints, return_inverse=True)
counts = np.bincount(inv)

# Check the sample spacing--is the sampling uniform?
dt = np.diff(times)
dts, inv = np.unique(dt, return_inverse=True)
dt_counts = np.bincount(inv)
print dts
# Inspection of `dts` shows that the smallest dt is about 1.7, and there
# are many near multiples of 1.74,  but the sampling is not uniform,
# so we'll analyze the spectrum using lombscargle.


# First remove the mean from the data.  This is not essential; it just
# removes the large value at the 0 frequency that we don't care about.
counts0 = counts - counts.mean()

# Minimum interarrival time.
dt_min = dt.min()

# Total time span.
trange = times[-1] - times[0]

# --- Lomb-Scargle calculation ---
num_ls_freqs = 16000
ls_min_freq = 1.0 / trange
ls_max_freq = 1.0 / dt_min
freqs = np.linspace(ls_min_freq, ls_max_freq, num_ls_freqs)
ls_pgram = lombscargle(times, counts0, 2*np.pi*freqs)

ls_peak_k = ls_pgram.argmax()
ls_peak_freq = freqs[ls_peak_k]
print "ls_peak_freq  =", ls_peak_freq


# --- FFT calculation of the binned data ---
# Assume the Lomb-Scargle calculation gave a good estimate
# of the fundamental frequency.  Use a bin size for the histogram
# of timepoints that oversamples that period by m.
m = 8
nbins = int(m * ls_peak_freq * trange + 0.5)
hist, bin_edges = np.histogram(timepoints, bins=nbins, density=True)
delta = bin_edges[1] - bin_edges[0]

fft_coeffs = np.fft.rfft(hist - hist.mean())
fft_freqs = np.fft.fftfreq(hist.size, d=delta)[:fft_coeffs.size]
# Hack to handle the case when hist.size is even.  `fftfreq` puts
# -nyquist where we want +nyquist.
fft_freqs[-1] = abs(fft_freqs[-1])

fft_peak_k = np.abs(fft_coeffs).argmax()
fft_peak_freq = fft_freqs[fft_peak_k]
print "fft_peak_freq =", fft_peak_freq


# --- Lomb-Scargle plot ---
plt.figure(1)
plt.clf()
plt.plot(freqs, ls_pgram)
plt.title('Spectrum computed by Lomb-Scargle')
plt.annotate("%6.4f Hz" % ls_peak_freq, 
             xy=(ls_peak_freq, ls_pgram[ls_peak_k]),
             xytext=(10, -10), textcoords='offset points')
plt.xlabel('Frequency (Hz)')
plt.grid(True)


# --- FFT plot ---
plt.figure(2)
plt.clf()
plt.plot(fft_freqs, np.abs(fft_coeffs)**2)
plt.annotate("%6.4f Hz" % fft_peak_freq,
             xy=(fft_peak_freq, np.abs(fft_coeffs[fft_peak_k])**2),
             xytext=(10, -10), textcoords='offset points')
plt.title("Spectrum computed by FFT")
plt.grid(True)

plt.show()

这篇关于我的功率谱可信吗?隆隆表情和fft之间的比较(scipy.signal和numpy.fft)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆