近排序的数组中删除未分类/离群元素 [英] Remove unsorted/outlier elements in nearly-sorted array

查看:150
本文介绍了近排序的数组中删除未分类/离群元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于像数组[15,14,12,3,10,4,2,1] 。如何确定哪些元素是无序和(在这种情况下,数字3)删除它们。我不想对列表进行排序,但检测离群值,将其删除。

Given an array like [15, 14, 12, 3, 10, 4, 2, 1]. How can I determine which elements are out of order and remove them (the number 3 in this case). I don't want to sort the list, but detect outliers and remove them.

另外一个例子:

[13,12,4,9,8,6,7,3,2]

我希望能够删除#4,#7,这样我结束了:

I want to be able to remove #4 and #7 so that I end up with:

[13,12,9,8,6,3,2]

还有这产生了一个问题,当你有这样的场景:

There's also a problem that arises when you have this scenario:

[15,13,12,7,10,5,4,3]

您既可以删除7或10,使这个数组排序。

You could either remove 7 or 10 to make this array sorted.

在一般情况下,我想解决的问题是,给定的数字阅读列表(有些可以用相当多的被关闭)。我想数组只包括遵循的一般趋势线,并移除任何离群值。我只是想知道如果有一个简单的方法来做到这一点。

In general, the problem I'm trying to solve, is that given a list of numerical readings (some could be off by quite a bit). I want the array to only include values that follow the general trendline and remove any outliers. I'm just wondering if there is a simple way to do this.

推荐答案

我将减少你的问题,以最长的增加(减少)序列问题。

I would reduce your problem to the longest increasing (decreasing) subsequence problem.

https://en.wikipedia.org/wiki/Longest_increasing_subsequence

由于您的序列几乎排序,你肯定可以得到满意的结果(即整齐趋势线以下)。

Since your sequence is nearly sorted, you are guaranteed to receive a satisfactory result (i.e. neatly following the trendline).

有存在数以IT解决方案;他们中的一个在免费的书被刻画基础用C# 通过Svetlin Nakov和韦塞林·科列夫;问题是psented 257页,运动6 $ P $;解决方案是260页。

There exists a number of solutions to it; one of them is portrayed in the free book "Fundamentals of Computer Programming with C#" by Svetlin Nakov and Veselin Kolev; the problem is presented on page 257, exercise 6; solution is on page 260.

这本书曾:

写一个程序,该发现增加元件的最大序列在阵列ARR [n]的。这是没有必要的元素被连续放置。例如:{9,6,2,7,4,7,6,5,8,4} - > {2,4,6,8}

Write a program, which finds the maximal sequence of increasing elements in an array arr[n]. It is not necessary the elements to be consecutively placed. E.g.: {9, 6, 2, 7, 4, 7, 6, 5, 8, 4} -> {2, 4, 6, 8}.

解决方案:

我们可以解决具有两个嵌套循环和一个多阵列LEN [0 ... N-1]的问题。在该阵列的len [Ⅰ]我们可以保持最长连续增加序列,其某处开始阵列中的长度(也没关系的确切位置),并用ARR [I]的元素结束。因此LEN [0] = 1,LEN [X]是最大的一笔最大(1 + LEN [preV),其中preV< x和ARR [preV< ARR [X]。以下的定义,我们可以用两个嵌套循环计算的len [0 ... N-1]:外循环将通过阵列迭代从左至右与循环变量x。内环将通过从一开始就阵列位置x-1,并搜索与LEN [preV],其中ARR [preV] LT的最大值元素$ P $光伏迭代; ARR [X]。搜索之后,我们初始化LEN [X]与LEN [$ P $光伏] 1+最大发现值或具有1,如果这样的值是找不到的。

We can solve the problem with two nested loops and one more array len[0…n-1]. In the array len[i] we can keep the length of the longest consecutively increasing sequence, which starts somewhere in the array (it does not matter where exactly) and ends with the element arr[i]. Therefore len[0]=1, len[x] is the maximal sum max(1 + len[prev]), where prev < x and arr[prev] < arr[x]. Following the definition, we can calculate len[0…n-1] with two nested loops: the outer loop will iterate through the array from left to right with the loop variable x. The inner loop will iterate through the array from the start to position x-1 and searches for the element prev with maximal value of len[prev], where arr[prev] < arr[x]. After the search, we initialize len[x] with 1 + the biggest found value of len[prev] or with 1, if such a value is not found.

所描述的算法找到所有极大递增序列,在每个元素结束的长度。这些值中的最大的一个是最长递增序列的长度。如果我们需要找到的元素本身,这构成了最长序列,我们可以从元素,其中序列结束(在指数X)开始,我们就可以打印出来,我们可以搜索previous元(preV)。根据定义preV&LT; x和LEN [X] = 1 + LEN [preV]因此,我们可以找到preV从1 for循环,以X-1。之后,我们可以重复X = $ P $光伏相同。通过直到它存在查找和打印previous元件(preV)多次,我们可以发现的元素,其构成最长序列中相反的顺序(从最后到第一)

The described algorithm finds the lengths of all maximal ascending sequences, which end at each of the elements. The biggest one of these values is the length of the longest increasing sequence. If we need to find the elements themselves, which compose that longest sequence, we can start from the element, where the sequence ends (at index x), we can print it and we can search for a previous element (prev). By definition prev < x and len[x] = 1 + len[prev] so we can find prev with a for-loop from 1 to x-1. After that we can repeat the same for x=prev. By finding and printing the previous element (prev) many times until it exists, we can find the elements, which compose the longest sequence in reversed order (from the last to the first).

这篇关于近排序的数组中删除未分类/离群元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆