如何用上一个和下一个邻居的均值替换离群值? [英] How can i replace outliers with the mean of previous and next neighbour?

查看:91
本文介绍了如何用上一个和下一个邻居的均值替换离群值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的数据集,它击败了两个激光频率,并用频率读出拍频.柜台.

I have a really large dataset from beating two laser frequencies and reading out the beat frequency with a freq. counter.

问题是我的数据集中有很多异常值.

The problem is that I have a lot of outliers in my dataset.

滤波不是一种选择,因为离群值的滤波/消除会杀死我用于分析拍频的Allan偏差的宝贵信息.

Filtering is not an option since the filtering/removing of outliers kills precious information for my allan deviation I use to analyze my beat frequency.

消除异常值的问题是我想比较三个不同拍频的Allan偏差.如果现在删除一些点,则我的x轴将比以前更短,而我的allan偏差x轴的缩放比例将有所不同. (adev基本上会建立一个新的x轴,从我的采样率间隔开始,直到我最长的测量时间->这是我的最高拍频x轴值.)

The problem with removing the outliers is that i want to compare allan deviations of three different beat frequencies. If i now remove some points i will have shorter x-axis than before and my allan deviation x-axis will scale differently. (The adev basically builds up a new x-axis starting with intervals of my sample rate up to my longest measurement time -> which is my highest beat frequency x-axis value.)

对不起,如果这令人困惑,我想提供尽可能多的信息.

Sorry if this is confusing, I wanted to give as many information as possible.

因此,无论如何,到目前为止,我的工作是使我所有的Allan偏差都可以工作并成功删除异常值,将我的清单切成间隔并将每个间隔的所有y值与该间隔的标准偏差进行比较.

So anyway, what i did until now is i got my whole allan deviation to work and removed outliers successfully, chopping my list into intervals and compare all y-values of each interval to the standard deviation of the interval.

我现在要更改的是,我不想删除异常值,而是希望用其上一个和下一个邻居的均值替换它们.

What i want to change now is that instead of removing the outliers i want to replace them with the mean of their previous and next neighbours.

在下面您可以找到带有异常值的列表的测试代码,在使用numpy的地方似乎有问题,而我并不真正理解为什么.

Below you can find my test code for a list with outliers, it seems have a problem using numpy where and i don't really understand why.

错误被给出为'numpy.int32'对象没有属性'where'".我必须将数据集转换为熊猫结构吗?

The error is given as "'numpy.int32' object has no attribute 'where'". Do I have to convert my dataset to a panda structure?

代码执行的操作是搜索高于/低于我的阈值的值,将其替换为NaN,然后​​用我的均值替换NaN.我不是真的喜欢使用NaN替代品,所以我将非常感谢您的帮助.

What the code does is searching for values above/below my threshold, replace them with NaN, and then replace NaN with my mean. I'm not really into using NaN replacement so i would be very grateful for any help.


l = np.array([[0,4],[1,3],[2,25],[3,4],[4,28],[5,4],[6,3],[7,4],[8,4]])

print(*l)

sd = np.std(l[:,1])

print(sd)

for i in l[:,1]:

    if l[i,1] > sd:
        print(l[i,1])
        l[i,1].where(l[i,1].replace(to_replace = l[i,1], value = np.nan),
                other = (l[i,1].fillna(method='ffill')+l[i,1].fillna(method='bfill'))/2)

所以我想要的是一个具有离群值的列表/数组,用先前/跟随的邻居的方式替换

so what i want is to have a list/array with the outliers replaced with the means of previous/following neighbours

错误消息:"numpy.int32"对象没有属性"where"

error message: 'numpy.int32' object has no attribute 'where'

推荐答案

一种选择的确是仅通过

import pandas as pd
dataset = pd.DataFrame({'Column1':data[:,0],'Column2':data[:,1]})

这将解决错误,因为pandas dataframe对象具有where命令. Howewer,这不是强制性的,我们仍然可以仅使用numpy进行操作

that will solve error as pandas dataframe object has where command. Howewer, that is not obligatory and we can still operate with just numpy

例如,检测异常值的最简单方法是查看异常值是否不在均值+ -3std范围内. 下面的代码示例,使用您的设置

For example, the easiest way to detect outliers is to look if they are not in range mean+-3std. Code example below, using your setting

import numpy as np
l = np.array([[0,4],[1,3],[2,25],[3,4],[4,28],[5,4],[6,3],[7,4],[8,4]])
std = np.std(l[:,1])
mean=np.mean(l[:,1])
for i in range (len(l[:,1])):
    if((l[i,1]<=mean+2*std)&(l[i,1]>=mean-2*std)):
        pass
    else:
        if (i!=len(l[:,1])-1)&(i!=0):
              l[i,1]=(l[i-1,1]+l[i+1,1])/2
        else:
              l[i,1]=mean

我们在这里首先检查的是值在行的异常值

What we did here first check is value is outlier at line

if((l[i,1]<=mean+2*std)&(l[i,1]>=mean-2*std)):
        pass

然后检查其是否不是第一个或最后一个元素

Then check if its not first or last element

if (i!=len(l[:,1])-1)&(i!=1):

如果是,则在字段中输入均值:

If it is, just put mean to the field:

else:
     l[i,1]=mean     

这篇关于如何用上一个和下一个邻居的均值替换离群值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆