如何用上一个和下一个邻居的均值替换离群值? [英] How can i replace outliers with the mean of previous and next neighbour?

查看：91 发布时间：2020/4/25 6:20:57 python python-3.x numpy jupyter

本文介绍了如何用上一个和下一个邻居的均值替换离群值?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个非常大的数据集，它击败了两个激光频率，并用频率读出拍频.柜台.

I have a really large dataset from beating two laser frequencies and reading out the beat frequency with a freq. counter.

问题是我的数据集中有很多异常值.

The problem is that I have a lot of outliers in my dataset.

滤波不是一种选择，因为离群值的滤波/消除会杀死我用于分析拍频的Allan偏差的宝贵信息.

Filtering is not an option since the filtering/removing of outliers kills precious information for my allan deviation I use to analyze my beat frequency.

消除异常值的问题是我想比较三个不同拍频的Allan偏差.如果现在删除一些点，则我的x轴将比以前更短，而我的allan偏差x轴的缩放比例将有所不同. (adev基本上会建立一个新的x轴，从我的采样率间隔开始，直到我最长的测量时间->这是我的最高拍频x轴值.)

The problem with removing the outliers is that i want to compare allan deviations of three different beat frequencies. If i now remove some points i will have shorter x-axis than before and my allan deviation x-axis will scale differently. (The adev basically builds up a new x-axis starting with intervals of my sample rate up to my longest measurement time -> which is my highest beat frequency x-axis value.)

对不起，如果这令人困惑，我想提供尽可能多的信息.

Sorry if this is confusing, I wanted to give as many information as possible.

因此，无论如何，到目前为止，我的工作是使我所有的Allan偏差都可以工作并成功删除异常值，将我的清单切成间隔并将每个间隔的所有y值与该间隔的标准偏差进行比较.

So anyway, what i did until now is i got my whole allan deviation to work and removed outliers successfully, chopping my list into intervals and compare all y-values of each interval to the standard deviation of the interval.

我现在要更改的是，我不想删除异常值，而是希望用其上一个和下一个邻居的均值替换它们.

What i want to change now is that instead of removing the outliers i want to replace them with the mean of their previous and next neighbours.

在下面您可以找到带有异常值的列表的测试代码，在使用numpy的地方似乎有问题，而我并不真正理解为什么.

Below you can find my test code for a list with outliers, it seems have a problem using numpy where and i don't really understand why.

错误被给出为'numpy.int32'对象没有属性'where'".我必须将数据集转换为熊猫结构吗?

The error is given as "'numpy.int32' object has no attribute 'where'". Do I have to convert my dataset to a panda structure?

代码执行的操作是搜索高于/低于我的阈值的值，将其替换为NaN，然后用我的均值替换NaN.我不是真的喜欢使用NaN替代品，所以我将非常感谢您的帮助.

What the code does is searching for values above/below my threshold, replace them with NaN, and then replace NaN with my mean. I'm not really into using NaN replacement so i would be very grateful for any help.


l = np.array([[0,4],[1,3],[2,25],[3,4],[4,28],[5,4],[6,3],[7,4],[8,4]])

print(*l)

sd = np.std(l[:,1])

print(sd)

for i in l[:,1]:

    if l[i,1] > sd:
        print(l[i,1])
        l[i,1].where(l[i,1].replace(to_replace = l[i,1], value = np.nan),
                other = (l[i,1].fillna(method='ffill')+l[i,1].fillna(method='bfill'))/2)

所以我想要的是一个具有离群值的列表/数组，用先前/跟随的邻居的方式替换

so what i want is to have a list/array with the outliers replaced with the means of previous/following neighbours

错误消息:"numpy.int32"对象没有属性"where"

error message: 'numpy.int32' object has no attribute 'where'

推荐答案

一种选择的确是仅通过

import pandas as pd
dataset = pd.DataFrame({'Column1':data[:,0],'Column2':data[:,1]})

这将解决错误，因为pandas dataframe对象具有where命令. Howewer，这不是强制性的，我们仍然可以仅使用numpy进行操作

that will solve error as pandas dataframe object has where command. Howewer, that is not obligatory and we can still operate with just numpy

例如，检测异常值的最简单方法是查看异常值是否不在均值+ -3std范围内. 下面的代码示例，使用您的设置

For example, the easiest way to detect outliers is to look if they are not in range mean+-3std. Code example below, using your setting

import numpy as np
l = np.array([[0,4],[1,3],[2,25],[3,4],[4,28],[5,4],[6,3],[7,4],[8,4]])
std = np.std(l[:,1])
mean=np.mean(l[:,1])
for i in range (len(l[:,1])):
    if((l[i,1]<=mean+2*std)&(l[i,1]>=mean-2*std)):
        pass
    else:
        if (i!=len(l[:,1])-1)&(i!=0):
              l[i,1]=(l[i-1,1]+l[i+1,1])/2
        else:
              l[i,1]=mean

我们在这里首先检查的是值在行的异常值

What we did here first check is value is outlier at line

if((l[i,1]<=mean+2*std)&(l[i,1]>=mean-2*std)):
        pass

然后检查其是否不是第一个或最后一个元素

Then check if its not first or last element

if (i!=len(l[:,1])-1)&(i!=1):

如果是，则在字段中输入均值:

If it is, just put mean to the field:

else:
     l[i,1]=mean

这篇关于如何用上一个和下一个邻居的均值替换离群值?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何用上一个和下一个邻居的均值替换离群值? [英] How can i replace outliers with the mean of previous and next neighbour?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何用上一个和下一个邻居的均值替换离群值? [英] How can i replace outliers with the mean of previous and next neighbour?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭