如何删除大量数组中的重复条目(javascript) [英] How to remove repeating entries in a massive array (javascript)

查看:61
本文介绍了如何删除大量数组中的重复条目(javascript)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Kendo UI绘制巨大的数据集(约160万点).这个数字太大了,但是我发现很多问题都是重复的.数据当前以以下格式存储:[[x,y],[x,y],[x,y] ...]每个x和y是一个数字,因此每个子数组都是一个点.我想到的方法是创建第二个空数组,然后遍历非常长的原始数组,并且仅在未找到新点的情况下才将其推入新点.

I'm trying to graph a huge data set (about 1.6 million points) using Kendo UI. This number is too large, but I have figured out that many of these points are repeating. The data is currently stored in this format: [ [x,y], [x,y], [x,y]...] with each x and y being a number, thus each subarray is a point. The approach I have in mind is to create a second empty array, and then loop through the very long original array, and only push each point to the new one if it isn't already found there.

我尝试使用jQuery.inArray(),但它似乎不适用于我在这里使用的2D数组.

I tried to use jQuery.inArray(), but it does not seem to work with the 2D array I have here.

我目前正在尝试:

    var datMinified = [];
    for( z = 2; z < dat1.length; z++) //I start at 2 because the first 2 elements are strings, disregard this
     {

       if( !(testContains(datMinified, dat1[z])) )
       {

         datMinified.push(dat1[z])

       }
      }

定义的辅助功能为:

    function testContains(arr, val)
      {
        for(i=0;i<arr.length;i++)
        {
          if( arraysEqual( arr[i], val) )
          {
            return true;
          }
        }
        return false;
      }

和:

    function arraysEqual(arr1, arr2)
    {
      if(! (arr1.length == arr2.length))
      {
        return false;
      }
      for( i = 0; i < arr1.length; i++ )
      {
        if( !(arr1[i] == arr2[i]))
        {
          return false;
        }
      }
      return true;
    }

当我运行此脚本时,即使使用较小的数组(长度为6,000),它仍然会卡住.也许jQuery是一个很好的解决方案?

When I run this script, even with a smaller array of length 6 thousand it still gets stuck up. Maybe jQuery is a good solution?

我还想,也许可以通过某种方式告诉浏览器不要超时,而只是坐下来处理数据?

I was also thinking that there might be some way to tell the browser to not time out and just sit and work through the data?

推荐答案

您遇到了一个不重要的问题,但我会一如既往地提出建议,请问我是否迷失了您.此解决方案 not 不会将坐标转换为String或使用 JSON.stringify -

You have a non-trivial problem but I'm gonna blast right through so ask questions if I lose you somewhere along the line. This solution does not cast the coordinate into a String or serialise it using other techniques like JSON.stringify -

从创建坐标的方式开始-

Start with a way to create coordinates -

const Coord = (x, y) =>
  [ x, y ]

为演示该解决方案,我需要构造许多随机坐标-

To demonstrate the solution, I need to construct many random coordinates -

const rand = x =>
  Math.floor(Math.random() * x)

const randCoord = x => 
  Coord(rand(x), rand(x))

console.log(randCoord(1e3))
// [ 655, 89 ]

现在我们制作一百万个随机坐标的数组-

Now we make an array of 1 million random coordinates -

const million =
  Array.from(Array(1e6), _ => randCoord(1e3))

现在,我们创建一个函数,使用 DeepMap 过滤所有唯一值,DeepMap是我在这个答案.

Now we make a function to filter all of the unique values using DeepMap, a tiny module I developed in this answer.

const uniq = (coords = []) =>
{ const m = new Map
  const r = []
  for (const c of coords)
    if (!DeepMap.has(m, c))
      { DeepMap.set(m, c, true)
        r.push(c)
      }
  return r
}

由于 for DeepMap 具有出色的性能, uniq 可以在一秒钟内识别出所有唯一值-

Because for and DeepMap have excellent performance, uniq can identify all of the unique values in less than one second -

console.time("uniq")
const result = uniq(million)
console.timeEnd("uniq")

console.log("uniq length:", result.length)
console.log("sample:", result.slice(0,10))

// uniq: 535 ms
// uniq length: 631970
// sample: 
// [ [ 908, 719 ]
// , [ 532, 967 ]
// , [ 228, 689 ]
// , [ 942, 546 ]
// , [ 716, 180 ]
// , [ 456, 427 ]
// , [ 714, 79 ]
// , [ 315, 480 ]
// , [ 985, 499 ]
// , [ 212, 407 ]
// ]

展开以下代码段,以在您自己的浏览器中验证结果-

Expand the snippet below to verify the results in your own browser -

const DeepMap =
  { has: (map, [ k, ...ks ]) =>
      ks.length === 0
        ? map.has(k)
        : map.has(k)
          ? DeepMap.has(map.get(k), ks)
          : false

  , set: (map, [ k, ...ks ], value) =>
      ks.length === 0
        ? map.set(k, value)
        : map.has(k)
            ? (DeepMap.set(map.get(k), ks, value), map)
            : map.set(k, DeepMap.set(new Map, ks, value))
  }

const Coord = (x, y) =>
  [ x, y ]

const rand = x =>
  Math.floor(Math.random() * x)

const randCoord = x => 
  Coord(rand(x), rand(x))

const million =
  Array.from(Array(1e6), _ => randCoord(1e3))

const uniq = (coords = []) =>
{ const m = new Map
  const r = []
  for (const c of coords)
    if (!DeepMap.has(m, c))
      { DeepMap.set(m, c, true)
        r.push(c)
      }
  return r
}

console.time("uniq")
const result = uniq(million)
console.timeEnd("uniq")

console.log("uniq length:", result.length)
console.log("sample:", result.slice(0,10))

// uniq: 535 ms
// uniq length: 631970
// sample: 
// [ [ 908, 719 ]
// , [ 532, 967 ]
// , [ 228, 689 ]
// , [ 942, 546 ]
// , [ 716, 180 ]
// , [ 456, 427 ]
// , [ 714, 79 ]
// , [ 315, 480 ]
// , [ 985, 499 ]
// , [ 212, 407 ]
// ]

通过使用生成较小的随机坐标,我们可以验证 uniq 是否正在生成正确的输出.下面我们生成最多 [100,100] 的坐标,最大可能性为10,000个唯一坐标.当您在下面运行该程序时,由于坐标是随机生成的,因此 result.length 可能小于 10,000,但绝不能超过该值-在这种情况下我们知道添加了无效(重复)坐标-

By using generating smaller random coordinates, we can verify that uniq is generating a correct output. Below we generate coordinates up to [ 100, 100 ] for a maximum possibility of 10,000 unique coordinates. When you run the program below, because the coordinates are generated at random, it's possible that result.length will be under 10,000, but it should never exceed it - in which case we'd know an invalid (duplicate) coordinate was added -

const million =
  Array.from(Array(1e6), _ => randCoord(1e2))

console.time("uniq")
const result = uniq(million)
console.timeEnd("uniq")

console.log("uniq length:", result.length)
console.log("sample:", result.slice(0,10))

// uniq: 173 ms
// uniq length: 10000
// sample: 
// [ [ 50, 60 ]
// , [ 18, 69 ]
// , [ 87, 10 ]
// , [ 8, 7 ]
// , [ 91, 41 ]
// , [ 48, 47 ]
// , [ 78, 28 ]
// , [ 39, 12 ]
// , [ 18, 84 ]
// , [ 0, 71 ]
// ]

展开以下代码段,以在您自己的浏览器中验证结果-

Expand the snippet below to verify the results in your own browser -

const DeepMap =
  { has: (map, [ k, ...ks ]) =>
      ks.length === 0
        ? map.has(k)
        : map.has(k)
          ? DeepMap.has(map.get(k), ks)
          : false

  , set: (map, [ k, ...ks ], value) =>
      ks.length === 0
        ? map.set(k, value)
        : map.has(k)
            ? (DeepMap.set(map.get(k), ks, value), map)
            : map.set(k, DeepMap.set(new Map, ks, value))
  }

const Coord = (x, y) =>
  [ x, y ]

const rand = x =>
  Math.floor(Math.random() * x)

const randCoord = x => 
  Coord(rand(x), rand(x))

const uniq = (coords = []) =>
{ const m = new Map
  const r = []
  for (const c of coords)
    if (!DeepMap.has(m, c))
      { DeepMap.set(m, c, true)
        r.push(c)
      }
  return r
}

const million =
  Array.from(Array(1e6), _ => randCoord(1e2))

console.time("uniq")
const result = uniq(million)
console.timeEnd("uniq")

console.log("uniq length:", result.length)
console.log("sample:", result.slice(0,10))

// uniq: 173 ms
// uniq length: 10000
// sample: 
// [ [ 50, 60 ]
// , [ 18, 69 ]
// , [ 87, 10 ]
// , [ 8, 7 ]
// , [ 91, 41 ]
// , [ 48, 47 ]
// , [ 78, 28 ]
// , [ 39, 12 ]
// , [ 18, 84 ]
// , [ 0, 71 ]
// ]

最后,我将在此处使用 DeepMap 模块-

Lastly, I'll include the DeepMap module used here -

const DeepMap =
  { has: (map, [ k, ...ks ]) =>
      ks.length === 0
        ? map.has(k)
        : map.has(k)
          ? DeepMap.has(map.get(k), ks)
          : false

  , set: (map, [ k, ...ks ], value) =>
      ks.length === 0
        ? map.set(k, value)
        : map.has(k)
            ? (DeepMap.set(map.get(k), ks, value), map)
            : map.set(k, DeepMap.set(new Map, ks, value))

  , get: (map, [ k, ...ks ]) =>
    // ...

  , entries: function* (map, fields = [])
    // ...
  }

有关完整的实现,请参见链接的问题与解答.Fwiw,我想您会发现该链接很有趣,因为它为该问题的复杂性提供了更多的上下文.

For a complete implementation, see the linked Q&A. Fwiw, I do think you will find the link interesting as it provides more context for the complexity of this problem.

这篇关于如何删除大量数组中的重复条目(javascript)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆