很多查询在一个任务中生成json [英] many queries in a task to generate json

查看:138
本文介绍了很多查询在一个任务中生成json的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个任务要建立,它将数据库中的大量数据存档到JSON。



为了更好地了解正在发生; X具有100个Y,Y具有100个Z等。我为每个X,Y和Z创建一个json文件。但是每个X json文件都有一个用于X的子Y的ids数组,同样Y存储一个子Z的数组。



在许多情况下,它比这更复杂,但是你应该从这个例子中了解到复杂性。



我使用ColdFusion但它似乎是一个不错的选择这个任务,因为它是崩溃,由于内存错误。在我看来,如果它是从内存中删除在运行任务(即:垃圾收集)不再引用的查询,那么任务应该有足够的内存,但afaict ColdFusion根本没有做任何垃圾收集,必须



因此,我正在寻找如何更好地实现我在CF中的任务的建议,或对其他

解决方案

> Eric,你绝对正确的ColdFusion垃圾收集不会从内存中删除查询信息,直到请求结束和我已经在另一个SO问题相当广泛地记录了。简而言之,当您循环查询时,您会遇到OoM异常。您可以使用像 VisualVM 这样的工具来证明它,以便在进程运行时生成堆转储,然后通过Eclipse 内存分析工具(MAT)运行生成的转储。什么MAT会显示你是一个大层次结构,从一个对象命名(我不是这样) CFDummyContent ,其中包含引用 cfquery cfqueryparam 标记。注意,试图将其更改为存储过程或甚至通过JDBC进行数据库交互没有什么区别。



所以。什么。至。是吗?



这花了我一段时间才能知道,但你有3个选项在增加的复杂性顺序:


  1. < cthread />

  2. 异步CFML网关

  3. 菊花链http请求
  4. li>

使用 cfthread 看起来像这样:

 < cfloop ...> 
< cfset threadName =thread& createUuid()>
< cfthread name =#threadName#input =#value#>
<!--- do query stuff --->
<!---代码可以访问传递的属性(例如#attributes.input#)--->
< cfset thread.passOutOfThread = somethingGeneratedInTheThread>
< / cfthread>
< cfthread action =joinname =#threadName#>
< cfset passedOutOfThread = cfthread [#threadName#]。passOutOfThread>
< / cfloop>

请注意,此代码利用异步处理,因此在每个线程调用之后立即连接,而是cfthread在其独立于页面的自身的请求类范围中运行的副作用。



这里我不介绍ColdFusion网关。 HTTP菊花链意味着执行工作的增量,并且在增量结束时,向同一算法发起请求,以指示它执行下一个增量。



基本上,所有三种方法都允许在进程中收集这些内存引用。



是的,无论谁问,Adobe都提出了错误,请参阅引用的问题。此外,我相信这个问题是特定于Adobe ColdFusion,但没有测试Railo或OpenDB。



最后,不得不说。我花了很多时间跟踪这一个,修复它在我自己的大代码库,和其他几个列出的问题引用也一样。 AFAIK Adob​​e没有承认这个问题,无论如何承诺修复它。是的,这是一个错误,平原和简单。


So I've got a task to build which is going to archive a ton of data in our DB into JSON.

To give you a better idea of what is happening; X has 100s of Ys, and Y has 100s of Zs and so on. I'm creating a json file for every X, Y, and Z. But every X json file has an array of ids for the child Ys of X, and likewise the Ys store an array of child Zs..

It more complicated than that in many cases, but you should get an idea of the complexity involved from that example I think.

I was using ColdFusion but it seems to be a bad choice for this task because it is crashing due to memory errors. It seems to me that if it were removing queries from memory that are no longer referenced while running the task (ie: garbage collecting) then the task should have enough memory, but afaict ColdFusion isn't doing any garbage collection at all, and must be doing it after a request is complete.

So I'm looking either for advice on how to better achieve my task in CF, or for recommendations on other languages to use..

Thanks.

解决方案

Eric, you are absolutely correct about ColdFusion garbage collection not removing query information from memory until request end and I've documented it fairly extensively in another SO question. In short, you hit OoM Exceptions when you loop over queries. You can prove it using a tool like VisualVM to generate a heap dump while the process is running and then running the resulting dump through Eclipse Memory Analyzer Tool (MAT). What MAT would show you is a large hierarchy, starting with an object named (I'm not making this up) CFDummyContent that holds, among other things, references to cfquery and cfqueryparam tags. Note, attempting to change it up to stored procs or even doing the database interaction via JDBC does not make difference.

So. What. To. Do?

This took me a while to figure out, but you've got 3 options in increasing order of complexity:

  1. <cthread/>
  2. asynchronous CFML gateway
  3. daisy chain http requests

Using cfthread looks like this:

<cfloop ...>
    <cfset threadName = "thread" & createUuid()>
    <cfthread name="#threadName#" input="#value#">
        <!--- do query stuff --->
        <!--- code has access to passed attributes (e.g. #attributes.input#) --->
        <cfset thread.passOutOfThread = somethingGeneratedInTheThread>
    </cfthread>
    <cfthread action="join" name="#threadName#">
    <cfset passedOutOfThread = cfthread["#threadName#"].passOutOfThread>
</cfloop>

Note, this code is not taking advantage of asynchronous processing, thus the immediate join after each thread call, but rather the side effect that cfthread runs in its own request-like scope independent of the page.

I'll not cover ColdFusion gateways here. HTTP daisy chaining means executing an increment of the work, and at the end of the increment launching a request to the same algorithm telling it to execute the next increment.

Basically, all three approaches allow those memory references to be collected mid process.

And yes, for whoever asks, bugs have been raised with Adobe, see the question referenced. Also, I believe this issue is specific to Adobe ColdFusion, but have not tested Railo or OpenDB.

Finally, have to rant. I've spent a lot of time tracking this one down, fixing it in my own large code base, and several others listed in the question referenced have as well. AFAIK Adobe has not acknowledge the issue much-the-less committed to fixing it. And, yes it's a bug, plain and simple.

这篇关于很多查询在一个任务中生成json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆