Python-裁剪数据以适合配置文件 [英] Python - Clipping out data to fit profiles

查看:122
本文介绍了Python-裁剪数据以适合配置文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要尝试适应不同的配置文件的几组数据.在其中一个最小值的中间,有污染使我无法很好地适应身体,如您在这张图片中所看到的:

I have several sets of data to which I'm trying to fit different profiles. In the centre of one of the minima there is contamination that prevents me from doing a good fit as you can see in this image:

考虑到尖峰并不总是位于同一位置,我该如何裁剪数据底部的那些尖峰?或者您将如何处理这样的数据?我正在使用lmfit来拟合轮廓,在这种情况下为Lorentzian和Gaussian.这是一个最小的工作示例,其中我使用初始值来更紧密地拟合数据:

How can I clip out those spikes in the bottom of my data taking into account that the spike is not always in the same position? Or how would you deal with data like this? I'm using lmfit to fit the profiles, in this case a Lorentzian and a Gaussian. Here is a minimal working example where I have played with the initial values to fit the data more closely:

import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
from lmfit.models import GaussianModel, ConstantModel, LorentzianModel

x = np.array([4085.18084467,  4085.38084374,  4085.5808428 , 4085.78084186, 4085.98084092,  4086.18083999,  4086.38083905,  4086.58083811, 4086.78083717,  4086.98083623,  4087.1808353 ,  4087.38083436, 4087.58083342,  4087.78083248,  4087.98083155,  4088.18083061, 4088.38082967,  4088.58082873,  4088.78082779,  4088.98082686, 4089.18082592,  4089.38082498,  4089.58082404,  4089.78082311, 4089.98082217,  4090.18082123,  4090.38082029,  4090.58081935, 4090.78081842,  4090.98081748,  4091.18081654,  4091.3808156 , 4091.58081466,  4091.78081373,  4091.98081279,  4092.18081185, 4092.38081091,  4092.58080998,  4092.78080904,  4092.9808081 , 4093.18080716,  4093.38080622,  4093.58080529,  4093.78080435, 4093.98080341,  4094.18080247,  4094.38080154,  4094.5808006 , 4094.78079966,  4094.98079872,  4095.18079778,  4095.38079685, 4095.58079591,  4095.78079497,  4095.98079403,  4096.1807931 , 4096.38079216,  4096.58079122,  4096.78079028,  4096.98078934, 4097.18078841,  4097.38078747,  4097.58078653,  4097.78078559,4097.98078466,  4098.18078372,  4098.38078278,  4098.58078184, 4098.7807809 ,  4098.98077997,  4099.18077903,  4099.38077809, 4099.58077715,  4099.78077622,  4099.98077528,  4100.18077434, 4100.3807734 ,  4100.58077246,  4100.78077153,  4100.98077059, 4101.18076965,  4101.38076871,  4101.58076778,  4101.78076684, 4101.9807659 ,  4102.18076496,  4102.38076402,  4102.58076309, 4102.78076215,  4102.98076121,  4103.18076027,  4103.38075934, 4103.5807584 ,  4103.78075746,  4103.98075652,  4104.18075558, 4104.38075465,  4104.58075371,  4104.78075277,  4104.98075183, 4105.1807509 ,  4105.38074996,  4105.58074902,  4105.78074808, 4105.98074714,  4106.18074621,  4106.38074527,  4106.58074433, 4106.78074339,  4106.98074246,  4107.18074152,  4107.38074058, 4107.58073964,  4107.7807387 ,  4107.98073777,  4108.18073683, 4108.38073589,  4108.58073495,  4108.78073401,  4108.98073308, 4109.18073214,  4109.3807312 ,  4109.58073026,  4109.78072933, 4109.98072839,  4110.18072745,  4110.38072651,  4110.58072557, 4110.78072464,  4110.9807237 ,  4111.18072276,  4111.38072182, 4111.58072089,  4111.78071995,  4111.98071901,  4112.18071807, 4112.38071713,  4112.5807162 ,  4112.78071526,  4112.98071432, 4113.18071338,  4113.38071245,  4113.58071151,  4113.78071057, 4113.98070963,  4114.18070869,  4114.38070776,  4114.58070682, 4114.78070588,  4114.98070494,  4115.18070401,  4115.38070307, 4115.58070213,  4115.78070119,  4115.98070025,  4116.18069932, 4116.38069838,  4116.58069744,  4116.7806965 ,  4116.98069557, 4117.18069463,  4117.38069369,  4117.58069275,  4117.78069181, 4117.98069088,  4118.18068994,  4118.380689  ,  4118.58068806, 4118.78068713,  4118.98068619,  4119.18068525,  4119.38068431, 4119.58068337,  4119.78068244,  4119.9806815 ,  4120.18068056, 4120.38067962,  4120.58067869,  4120.78067775,  4120.98067681, 4121.18067587,  4121.38067493,  4121.580674  ,  4121.78067306, 4121.98067212,  4122.18067118,  4122.38067025,  4122.58066931, 4122.78066837,  4122.98066743,  4123.18066649,  4123.38066556, 4123.58066462,  4123.78066368,  4123.98066274,  4124.1806618 , 4124.38066087,  4124.58065993,  4124.78065899,  4124.98065805, 4125.18065712,  4125.38065618,  4125.58065524,  4125.7806543 , 4125.98065336,  4126.18065243,  4126.38065149,  4126.58065055, 4126.78064961,  4126.98064868,  4127.18064774,  4127.3806468 , 4127.58064586,  4127.78064492,  4127.98064399,  4128.18064305, 4128.38064211,  4128.58064117,  4128.78064024,  4128.9806393 , 4129.18063836,  4129.38063742,  4129.58063648,  4129.78063555, 4129.98063461,  4130.18063367,  4130.38063273,  4130.5806318 , 4130.78063086,  4130.98062992,  4131.18062898,  4131.38062804, 4131.58062711,  4131.78062617,  4131.98062523,  4132.18062429, 4132.38062336,  4132.58062242,  4132.78062148,  4132.98062054, 4133.1806196 ,  4133.38061867,  4133.58061773,  4133.78061679, 4133.98061585,  4134.18061492,  4134.38061398,  4134.58061304, 4134.7806121 ,  4134.98061116])
y = np.array([0.90312759,  1.00923175,  0.94618369,  0.98284045,  0.91510612,        0.96737804,  0.97690214,  0.94363369,  1.00887784,  1.00110387,        0.91647096,  0.97943202,  1.00672907,  1.01552094,  1.01089407,        0.96914584,  0.9908419 ,  1.0176613 ,  0.97032148,  0.96003562,        0.9702355 ,  0.93684173,  0.94652734,  0.94895018,  1.01214356,        0.85777678,  0.89308203,  0.9789272 ,  0.93901884,  0.9684622 ,        0.96969321,  0.86326307,  0.89607392,  0.92459571,  1.00454429,        1.06019733,  0.97291196,  0.95646497,  0.95899707,  1.02830351,        0.94938178,  0.91481128,  0.92606219,  0.97085631,  0.93597434,        0.91316857,  0.90644542,  0.91726926,  0.91686184,  0.96445563,        0.92166362,  0.95831572,  0.93859066,  0.85285273,  0.89944073,        0.91812428,  0.94265677,  0.88281406,  0.9470601 ,  0.94921529,        0.97289222,  0.94632251,  0.96633195,  0.94096512,  0.95324803,        0.90920845,  0.92100257,  0.91181745,  0.95715298,  0.91715382,        0.90219214,  0.87585035,  0.86592191,  0.89335902,  0.85536392,        0.89619274,  0.9450366 ,  0.82780137,  0.81214176,  0.83461329,        0.82858317,  0.80851704,  0.79253546,  0.85440086,  0.81679169,        0.80579976,  0.72312218,  0.75583125,  0.75204599,  0.84519188,        0.68686821,  0.71472154,  0.71706318,  0.72640234,  0.70526356,        0.68295282,  0.66795774,  0.65004383,  0.68096834,  0.72697547,        0.72436393,  0.77128385,  0.79666758,  0.67349101,  0.61479406,        0.57046337,  0.51614312,  0.52945366,  0.53112169,  0.53757761,        0.56680358,  0.63839684,  0.60704329,  0.62377533,  0.67862515,        0.64587581,  0.71316115,  0.76309798,  0.72217569,  0.7477785 ,        0.79731849,  0.76934137,  0.77063868,  0.77871584,  0.77688526,        0.84342722,  0.85382332,  0.88700466,  0.85837992,  0.79589266,        0.83798993,  0.79835529,  0.84612746,  0.83214907,  0.86373676,        0.90729115,  0.82111605,  0.86165685,  0.84090099,  0.90389133,        0.89554032,  0.90792356,  0.92798016,  0.95588479,  0.95019718,        0.95447497,  0.89845759,  0.91638311,  0.99263342,  0.97477606,        0.95482538,  0.94489498,  0.94344967,  0.90526465,  0.92538486,        0.96279787,  0.94005143,  0.96842454,  0.92296494,  0.89954172,        0.8684367 ,  0.95039002,  0.95229769,  0.93752274,  0.94741173,        0.96704449,  1.01130839,  0.95499414,  0.99596569,  0.95130622,        1.00014723,  1.00252218,  0.95130331,  1.0022896 ,  0.99851989,        0.94405282,  0.95814021,  0.94851972,  1.01302067,  1.01400272,        0.97960083,  0.97070283,  1.01312797,  0.9842154 ,  1.01147273,       0.97331853,  0.91403182,  0.96813051,  0.92319169,  0.9294103 ,        0.96960715,  0.94811518,  0.97115083,  0.84687543,  0.90725159,        0.88061293,  0.87319615,  0.85331661,  0.89775082,  0.90956716,        0.83174505,  0.89753388,  0.89554364,  0.95329739,  0.87687031,        0.93883127,  0.97433899,  0.99515225,  0.97519981,  0.91956466,        0.97977674,  0.93582089,  1.00662722,  0.90157277,  1.02887754,        0.9777419 ,  0.94257094,  1.02359615,  0.98968414,  1.00075502,        1.03230265,  1.05904074,  1.00488442,  1.05507886,  1.05085518,        1.02561781,  1.05896008,  0.98024381,  1.08005691,  0.94528977,        1.03853637,  1.02064405,  1.0467137 ,  1.05375156,  1.12907949,        0.99295611,  1.06601022,  1.02846374,  0.98006807,  0.96446772,        0.97702428,  0.97788589,  0.93889781,  0.96366778,  0.96645265,        0.95857242,  1.05796304,  0.99441763,  1.00573183,  1.05001927])
e = np.array([0.0647344 ,  0.04583914,  0.05665552,  0.04447208,  0.05644753,        0.03968611,  0.05985188,  0.04252311,  0.03366922,  0.04237672,        0.03765898,  0.03290132,  0.04626836,  0.05106203,  0.03619188,        0.03944098,  0.08115469,  0.05859644,  0.06091101,  0.05170821,        0.0427244 ,  0.06804469,  0.06708318,  0.03369381,  0.04160575,        0.08007032,  0.09292148,  0.04378329,  0.08216214,  0.06087074,        0.05375458,  0.06185891,  0.06385766,  0.08084546,  0.04864063,        0.06400878,  0.04988693,  0.06689165,  0.05989534,  0.08010138,        0.0681177 ,  0.04478208,  0.03876582,  0.05977015,  0.06610619,        0.05020086,  0.07244604,  0.0445143 ,  0.06970626,  0.04423994,        0.0414573 ,  0.06892836,  0.05715395,  0.04014724,  0.07908425,        0.06082051,  0.08380691,  0.08576757,  0.06571406,  0.04842625,        0.05298355,  0.05271857,  0.06340425,  0.10849621,  0.0811072 ,        0.03642638,  0.10614094,  0.09865099,  0.06711037,  0.10244762,        0.11843505,  0.1092357 ,  0.09748241,  0.09657009,  0.09970179,        0.10203563,  0.18494082,  0.14097796,  0.1151294 ,  0.16172895,        0.17611204,  0.16226913,  0.2295418 ,  0.17795924,  0.1253298 ,        0.1771586 ,  0.15139061,  0.14739618,  0.1620105 ,  0.19158538,        0.21431605,  0.19292715,  0.23308884,  0.30519423,  0.31401994,        0.30569885,  0.31216375,  0.35147676,  0.25016472,  0.16232236,        0.09058787,  0.0604483 ,  0.05168302,  0.21432774,  0.38149791,        0.5061975 ,  0.44281541,  0.50646427,  0.43761581,  0.44989111,        0.47778238,  0.39944325,  0.32462726,  0.34560857,  0.3175776 ,        0.30253441,  0.23059451,  0.24516185,  0.20708065,  0.26429751,        0.1830661 ,  0.15155041,  0.16497299,  0.15794139,  0.13626666,        0.17839823,  0.13502886,  0.14148522,  0.10869864,  0.11723602,        0.09074029,  0.06922157,  0.07719777,  0.13181317,  0.11441895,        0.10655855,  0.12073767,  0.0846133 ,  0.07974657,  0.06538693,        0.0573741 ,  0.07864047,  0.08351471,  0.08130351,  0.0768824 ,        0.07951992,  0.04478989,  0.0765122 ,  0.04842814,  0.04355571,        0.05138656,  0.07215294,  0.04681987,  0.05790133,  0.06163808,        0.082449  ,  0.06127927,  0.04971221,  0.05107901,  0.04493687,        0.06072161,  0.06094332,  0.03630467,  0.04162285,  0.04058228,        0.04526251,  0.06191432,  0.04901982,  0.0454908 ,  0.06186274,        0.0407017 ,  0.03865571,  0.04353665,  0.03898987,  0.04666321,        0.05856035,  0.04225933,  0.04797901,  0.03523971,  0.04728414,        0.05494382,  0.04773011,  0.03210954,  0.05651663,  0.03625933,        0.03596701,  0.03800191,  0.06267668,  0.06431192,  0.0602614 ,        0.05139896,  0.04571979,  0.04375182,  0.0576867 ,  0.07491418,        0.05339972,  0.07619115,  0.11569378,  0.07087871,  0.09076518,        0.13554717,  0.07811761,  0.07180695,  0.05831886,  0.06042863,        0.08759576,  0.06650081,  0.08420164,  0.08185432,  0.04338836,        0.04970979,  0.04008252,  0.03605485,  0.03456321,  0.05594584,        0.03856822,  0.03576337,  0.03118799,  0.0441686 ,  0.0469118 ,        0.03591666,  0.03562582,  0.04934832,  0.03280972,  0.03201576,        0.04338048,  0.07443531,  0.04121059,  0.03774147,  0.03717577,        0.03354207,  0.03806978,  0.0319364 ,  0.03715712,  0.0379478 ,        0.04867626,  0.0304592 ,  0.03393844,  0.034518  ,  0.04293514,        0.05177898,  0.05332907,  0.0352937 ,  0.03359781,  0.04625272,        0.03733088,  0.03501259,  0.03346308,  0.04333749,  0.05741173])

cont = ConstantModel(prefix='cte_')
pars = cont.guess(y, x=x)

gauss = GaussianModel(prefix='g_')
pars.update( gauss.make_params())    
pars['cte_c'].set(1)
pars['g_center'].set(4125, min=4120, max=4130)
pars['g_sigma'].set(1, min=0.5)
pars['g_amplitude'].set(-0.2, min=-0.5)

loren = LorentzianModel(prefix='l_')
pars.update( loren.make_params())    
pars['l_center'].set(4106, min=4095, max=4115)
pars['l_sigma'].set(4, max=6)
pars['l_amplitude'].set(-6., max=-4.)

model = gauss + loren + cont

init = model.eval(pars, x=x)
result = model.fit(y, pars, x=x, weights=1/e)

#print(result.fit_report(min_correl=0.5))

fig, ax = plt.subplots(figsize=(8,6))

ax.plot(x, y, 'k-', lw=2) # data in red
ax.plot(x, init, 'g--', lw=2) # initial guess 
ax.plot(x, result.best_fit, 'r-', lw=2) # best fit
ax.set(xlim=(4085,4135), ylim=(0.4,1.14))

推荐答案

如果坏点始终位于相同的x值,则可以从数据中删除该点,也许可以使用类似的方法:

If the bad point is always at the same x value, you could remove that point from the data, perhaps with something like:

import numpy as np
def index_nearest(array, value):
    """index of array nearest to value"""
    return np.abs(array-value).argmin()

ybad = index_nearest(x, 4150)
y[ybad] = x[ybad] = np.nan
x = x[np.where(np.isfinite(y))]
y = y[np.where(np.isfinite(y))]

,然后将模型与这些数据相匹配,并消除了不良之处.

and then fit your model to those data with the bad point removed.

但是,另外,:如果存在明显错误的点并且数据只是"嘈杂,则消除看起来不好的点可能没有任何优势.您的数据对我来说看起来很嘈杂,但是很难看出有系统上的坏处.如果要删除一点,请记住,您断言该测量不仅受到正常噪声的影响,而且是错误的.

But, also: if there is not an obviously errant point and the data "just" noisy, there is probably no advantage to removing what looks like bad points. Your data looks noisy to me, but it's hard to see that there is a systematically bad point. If you are going to remove a point, remember that you are asserting that this measurement was not merely affected by normal noise, but was wrong.

最后:处理嘈杂数据的另一种方法可能是尝试使数据平滑,例如使用Savitzky-Golay滤波器.用这种方法平滑特征总是存在一定的危险,但是适度的S-G过滤器通常对于清除足以检测特征的噪声数据很有用.当然,如果对过滤后的数据进行拟合所得到的结果与对未过滤后的数据进行拟合所产生的结果大不相同,那么您可能需要了解为什么会这样.

Finally: another approach to treating noisy data might be to try to smooth the data, say with a Savitzky-Golay filter. There is always some danger of smoothing out features with such an approach, but a modest S-G filter is often good for cleaning up noisy data enough to detect features. Of course, if fits to filtered data give significantly different results from fits to unfiltered data, you will probably need to understand why that is.

这篇关于Python-裁剪数据以适合配置文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆