任务中的许多查询以生成 json [英] many queries in a task to generate json

查看:11
本文介绍了任务中的许多查询以生成 json的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

So I've got a task to build which is going to archive a ton of data in our DB into JSON.

To give you a better idea of what is happening; X has 100s of Ys, and Y has 100s of Zs and so on. I'm creating a json file for every X, Y, and Z. But every X json file has an array of ids for the child Ys of X, and likewise the Ys store an array of child Zs..

It more complicated than that in many cases, but you should get an idea of the complexity involved from that example I think.

I was using ColdFusion but it seems to be a bad choice for this task because it is crashing due to memory errors. It seems to me that if it were removing queries from memory that are no longer referenced while running the task (ie: garbage collecting) then the task should have enough memory, but afaict ColdFusion isn't doing any garbage collection at all, and must be doing it after a request is complete.

So I'm looking either for advice on how to better achieve my task in CF, or for recommendations on other languages to use..

Thanks.

解决方案

Eric, you are absolutely correct about ColdFusion garbage collection not removing query information from memory until request end and I've documented it fairly extensively in another SO question. In short, you hit OoM Exceptions when you loop over queries. You can prove it using a tool like VisualVM to generate a heap dump while the process is running and then running the resulting dump through Eclipse Memory Analyzer Tool (MAT). What MAT would show you is a large hierarchy, starting with an object named (I'm not making this up) CFDummyContent that holds, among other things, references to cfquery and cfqueryparam tags. Note, attempting to change it up to stored procs or even doing the database interaction via JDBC does not make difference.

So. What. To. Do?

This took me a while to figure out, but you've got 3 options in increasing order of complexity:

  1. <cthread/>
  2. asynchronous CFML gateway
  3. daisy chain http requests

Using cfthread looks like this:

<cfloop ...>
    <cfset threadName = "thread" & createUuid()>
    <cfthread name="#threadName#" input="#value#">
        <!--- do query stuff --->
        <!--- code has access to passed attributes (e.g. #attributes.input#) --->
        <cfset thread.passOutOfThread = somethingGeneratedInTheThread>
    </cfthread>
    <cfthread action="join" name="#threadName#">
    <cfset passedOutOfThread = cfthread["#threadName#"].passOutOfThread>
</cfloop>

Note, this code is not taking advantage of asynchronous processing, thus the immediate join after each thread call, but rather the side effect that cfthread runs in its own request-like scope independent of the page.

I'll not cover ColdFusion gateways here. HTTP daisy chaining means executing an increment of the work, and at the end of the increment launching a request to the same algorithm telling it to execute the next increment.

Basically, all three approaches allow those memory references to be collected mid process.

And yes, for whoever asks, bugs have been raised with Adobe, see the question referenced. Also, I believe this issue is specific to Adobe ColdFusion, but have not tested Railo or OpenDB.

Finally, have to rant. I've spent a lot of time tracking this one down, fixing it in my own large code base, and several others listed in the question referenced have as well. AFAIK Adobe has not acknowledge the issue much-the-less committed to fixing it. And, yes it's a bug, plain and simple.

这篇关于任务中的许多查询以生成 json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆