检查重复的JavaScript对象 [英] Checking for duplicate Javascript objects

查看:154
本文介绍了检查重复的JavaScript对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL; DR版本:我想避免将重复的JavaScript对象添加到类似对象的数组中,其中一些可能真的很大。什么是最好的方法?

TL;DR version: I want to avoid adding duplicate Javascript objects to an array of similar objects, some of which might be really big. What's the best approach?

我有一个应用程序,我将大量的JSON数据加载到Javascript数据结构中。虽然这比这更复杂一些,但假设我通过一系列AJAX请求将JSON加载到服务器的JavaScript对象数组中,如下所示:

I have an application where I'm loading large amounts of JSON data into a Javascript data structure. While it's a bit more complex than this, assume that I'm loading JSON into an array of Javascript objects from a server through a series of AJAX requests, something like:

var myObjects = [];

function processObject(o) {
    myObjects.push(o);
}

for (var x=0; x<1000; x++) {
    $.getJSON('/new_object.json', processObject);
}

为了使事情复杂化,JSON:

To complicate matters, the JSON:


  • 是一个未知模式

  • 是任意长度的(可能不是很大,但可能在100-200 kb范围内) / li>
  • 可能包含不同请求的重复项

我最初的想法是有一个额外的对象来存储每个对象的散列(通过 JSON.stringify ?),并检查每个对象的负载,如下所示:

My initial thought is to have an additional object to store a hash of each object (via JSON.stringify?) and check against it on each load, like this:

var myHashMap = {};

function processObject(o) {
    var hash = JSON.stringify(o);
    // is it in the hashmap?
    if (!(myHashMap[hash])) {
        myObjects.push(o);
        // set the hashmap key for future checks
        myHashMap[hash] = true;
    }
    // else ignore this object
}

我担心在 myHashMap 中的属性名称可能长度为200 kb。所以我的问题是:

but I'm worried about having property names in myHashMap that might be 200 kb in length. So my questions are:


  • 对于这个问题,hashmap的想法有更好的方法吗?

  • 如果没有,是否有更好的方法为任何长度和模式的JSON对象制作哈希函数,而不是 JSON.stringify

  • 对象中超长的属性名可能有什么问题?

推荐答案

建议您创建一个JSON.stringify(o)的MD5哈希值,并将其存储在hashmap中,并将引用存储的对象作为散列数据。为了确保在 JSON.stringify()中没有对象键顺序的差异,您必须创建一个订购键的对象副本。

I'd suggest you create an MD5 hash of the JSON.stringify(o) and store that in your hashmap with a reference to your stored object as the data for the hash. And to make sure that there are no object key order differences in the JSON.stringify(), you have to create a copy of the object that orders the keys.

然后,当每个新对象进来时,都会根据哈希图进行检查。如果您在散列图中找到匹配项,则将传入对象与您存储的实际对象进行比较,以查看它们是否真正重复(因为可能存在MD5哈希冲突)。这样,你有一个可管理的哈希表(只有MD5哈希值)。

Then, when each new object comes in, you check it against the hash map. If you find a match in the hash map, then you compare the incoming object with the actual object that you've stored to see if they are truly duplicates (since there can be MD5 hash collisions). That way, you have a manageable hash table (with only MD5 hashes in it).

这里是创建对象的规范字符串表示形式的代码(包括嵌套对象或对象如果您刚刚调用JSON.stringify(),则处理对象键可能处于不同的顺序。

Here's code to create a canonical string representation of an object (including nested objects or objects within arrays) that handles object keys that might be in a different order if you just called JSON.stringify().

// Code to do a canonical JSON.stringify() that puts object properties 
// in a consistent order
// Does not allow circular references (child containing reference to parent)
JSON.stringifyCanonical = function(obj) {
    // compatible with either browser or node.js
    var Set = typeof window === "object" ? window.Set : global.Set;

    // poor man's Set polyfill
    if (typeof Set !== "function") {
        Set = function(s) {
            if (s) {
                this.data = s.data.slice();
            } else {
                this.data = [];
            }
        };
        Set.prototype = {
            add: function(item) {
                this.data.push(item);
            },
            has: function(item) {
                return this.data.indexOf(item) !== -1;
            }
        };
    }

    function orderKeys(obj, parents) {
        if (typeof obj !== "object") {
            throw new Error("orderKeys() expects object type");
        }
        var set = new Set(parents);
        if (set.has(obj)) {
            throw new Error("circular object in stringifyCanonical()");
        }
        set.add(obj);
        var tempObj, item, i;
        if (Array.isArray(obj)) {
            // no need to re-order an array
            // but need to check it for embedded objects that need to be ordered
            tempObj = [];
            for (i = 0; i < obj.length; i++) {
                item = obj[i];
                if (typeof item === "object") {
                    tempObj[i] = orderKeys(item, set);
                } else {
                    tempObj[i] = item;
                }
            }
        } else {
            tempObj = {};
            // get keys, sort them and build new object
            Object.keys(obj).sort().forEach(function(item) {
                if (typeof obj[item] === "object") {
                    tempObj[item] = orderKeys(obj[item], set);
                } else {
                    tempObj[item] = obj[item];
                }
            });
        }
        return tempObj;
    }

    return JSON.stringify(orderKeys(obj));
}

而且,算法

var myHashMap = {};

function processObject(o) {
    var stringifiedCandidate = JSON.stringifyCanonical(o);
    var hash = CreateMD5(stringifiedCandidate);
    var list = [], found = false;
    // is it in the hashmap?
    if (!myHashMap[hash] {
        // not in the hash table, so it's a unique object
        myObjects.push(o);
        list.push(myObjects.length - 1);    // put a reference to the object with this hash value in the list
        myHashMap[hash] = list;             // store the list in the hash table for future comparisons
    } else {
        // the hash does exist in the hash table, check for an exact object match to see if it's really a duplicate
        list = myHashMap[hash];             // get the list of other object indexes with this hash value
        // loop through the list
        for (var i = 0; i < list.length; i++) {
            if (stringifiedCandidate === JSON.stringifyCanonical(myObjects[list[i]])) {
                found = true;       // found an exact object match
                break;
            }
        }
        // if not found, it's not an exact duplicate, even though there was a hash match
        if (!found) {
            myObjects.push(o);
            myHashMap[hash].push(myObjects.length - 1);
        }
    }
}

<$ c的测试用例$ c> jsonStringifyCanonical()在这里: https://jsfiddle.net/jfriend00/ zfrtpqcL /

这篇关于检查重复的JavaScript对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆