当进程内存不足时删除大型Javascript对象 [英] Deleting large Javascript objects when process is running out of memory

查看:76
本文介绍了当进程内存不足时删除大型Javascript对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是这种javascript的新手,所以我会做一个简短的解释:

我有一个内置在Nodejs中的Web抓取工具,该抓取工具收集(相当多的)数据,用Cheerio(基本上是jQuery表示Node)对其进行处理,创建一个对象,然后将其上传到mongoDB. >

除了在较大的站点上,它工作得很好. 出现的情况是:

  1. 我给刮板一个在线商店的网址进行刮板
  2. 节点转到该URL,并检索5,000-40,000个产品网址以进行抓取
  3. 对于每个新URL,Node的request模块获取页面源,然后将数据加载到Cheerio.
  4. 使用Cheerio,我创建了一个代表产品的JS对象.
  5. 我将对象运送到MongoDB,并将其保存到数据库中.

正如我所说,这发生在成千上万个URL上,而一旦我加载了10,000个URL,就会在节点中出错.最常见的是:

Node: Fatal JS Error: Process out of memory

好,这是实际的问题:

认为之所以会这样,是因为Node的垃圾清理无法正常进行.例如,从所有40,000个网址中抓取的request数据有可能仍在内存中,或者至少有40,000个创建的javascript对象.也许这也是因为MongoDB连接是在会话开始时建立的,并且永远不会关闭(一旦所有产品完成,我就手动关闭脚本).这是为了避免每次我登录新产品时都打开/关闭连接.

要真正确保正确清理它们(一旦产品进入MongoDB,我将不再使用它,并且可以将其从内存中删除)可以/应该只是使用delete product从内存中将其删除吗?

此外(如果我不删除JS处理对象的方式,那么我显然不知道JS如何处理该对象)是否完全从内存中擦除了该对象的引用,还是我必须删除所有这些对象?

例如:

var saveToDB = require ('./mongoDBFunction.js');

function getData(link){
    request(link, function(data){
        var $ = cheerio.load(data);
        createProduct($)
    })
}

function createProduct($)   
    var product = {
        a: 'asadf',
        b: 'asdfsd'
        // there's about 50 lines of data in here in the real products but this is for brevity
    }    
    product.name = $('.selector').dostuffwithitinjquery('etc');
    saveToDB(product);
}

// In mongoDBFunction.js

exports.saveToDB(item){
    db.products.save(item, function(err){
        console.log("Item was successfully saved!");
        delete item; // Will this completely delete the item from memory?
    })
}

javascript中的

解决方案

delete不能用于删除变量或释放内存.它仅用于从对象中删除属性.您可能会在delete运算符上找到本文

您可以通过将变量设置为类似null的方式来删除对变量中保存的数据的引用.如果没有对该数据的其他引用,则将使该数据有资格进行垃圾回收.如果存在对该对象的其他引用,则在没有更多引用之前,它不会从内存中清除(例如,您的代码无法访问它).

关于导致内存累积的原因有很多可能性,我们真的看不到您的代码太多,无法知道可以保留哪些引用,从而使GC不能释放任何东西.

如果这是一个长期运行的过程,并且没有执行中断,那么您可能还需要手动运行垃圾收集器,以确保有机会清理已释放的内容.

这里有几篇关于跟踪您在node.js中的内存使用情况的文章:http://dtrace.org/blogs/bmc/2012/05/05/debugging-node-js-memory-leaks/

delete in javascript is NOT used to delete variables or free memory. It is ONLY used to remove a property from an object. You may find this article on the delete operator a good read.

You can remove a reference to the data held in a variable by setting the variable to something like null. If there are no other references to that data, then that will make it eligible for garbage collection. If there are other references to that object, then it will not be cleared from memory until there are no more references to it (e.g. no way for your code to get to it).

As for what is causing the memory accumulation, there are a number of possibilities and we can't really see enough of your code to know what references could be held onto that would keep the GC from freeing up things.

If this is a single, long running process with no breaks in execution, you might also need to manually run the garbage collector to make sure it gets a chance to clean up things you have released.

Here's are a couple articles on tracking down your memory usage in node.js: http://dtrace.org/blogs/bmc/2012/05/05/debugging-node-js-memory-leaks/ and https://hacks.mozilla.org/2012/11/tracking-down-memory-leaks-in-node-js-a-node-js-holiday-season/.

这篇关于当进程内存不足时删除大型Javascript对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆