PlayFramework:糟糕的json反序列化性能 [英] PlayFramework: poor json deserializing performance

查看:97
本文介绍了PlayFramework:糟糕的json反序列化性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基础结构和前言

我在AWS EC2实例上托管了一个PlayFramework(2.3.8)应用程序.我有一组复杂的对象,应通过Web API将其作为JSON字符串返回.我需要一个数组的深层副本,所有子对象都已完全加载,直到最后一层.该数组的大小为30-100个条目,每个条目具有大约1-10个条目,其中每个条目最多具有100个属性,最后没有BLOB或类似的东西涉及,它们都可以归结为字符串/双精度数/整数/布尔值.我不确定确切的数据结构有多重要,如果您需要更多详细信息,请告诉我.生成的.json文件大小约为1 MB.

I have a PlayFramework (2.3.8) App hosted on an AWS EC2 instance. I have an array of complex objects, which should be returned as a JSON string via a web API. I need a deep copy of the array, with all child objects fully loaded until the very last layer. The array has the size of 30-100 entries, each entry has around 1-10 entries, each entry of those has up to 100 properties, in the end there are no BLOBs or similar involved, it all boils down to strings/doubles/ints/bools. I am unsure how far the exact data structure is of importance, please let me know if you need more details. The resulting .json file size is about 1 MB.

对该数组进行反序列化的性能非常差,因为在我的本地计算机上,〜1 MB大约需要3-5分钟;在EC2上大约需要20到30秒.

The performance of deserializing this array is awful, for the ~1 MB on my local machine it takes 3-5 minutes; on the EC2 it takes about 20-30 seconds.

最初的问题:使用play.libs json时性能不佳

我的对象数组已加载并存储为JsonNode.然后,将此JsonNode转发到ObjectMapper,ObjectMapper最终将其编写为prettyPrinted:

My array of objects is loaded and stored as a JsonNode. This JsonNode is then forwarded to an ObjectMapper, which finally writes it prettyPrinted:

List<myObject> myObjects = myObjectService.getInstance().getAllObjects(); // simplified example

JsonNode myJsonNode = Json.toJson(myObjects); // this line of code takes a huge amount of time!

ObjectMapper om = new ObjectMapper();
return om.writerWithDefaultPrettyPrinter().writeValueAsString(myJsonNode); // this runs in <10 ms

因此,我将罪魁祸首定为Json.toJson反序列化.据我所知,它是PlayFramework使用的一种包装好的Jackson库.

So I nailed down the culprit to be the Json.toJson deserialization. As far as I could find out, it is a sort-of-wrapped Jackson library which is used by the PlayFramework.

虽然我已经了解了JSON反序列化的一些性能问题,但是我不确定我们是否应该谈论几百毫秒到几秒钟,而不是几分钟.无论如何,我尝试实现其他一些JSON库(GSON,argonaut,flexjson),但运行起来并不顺利.

While I have read about some performance issues of JSON deserializing, I am unsure if we should be talking about some hundred-milliseconds to seconds, and not minutes. Anyway, I tried implementing some other JSON libraries (GSON, argonaut, flexjson), which didn't really go smoothly.

GSON

我简单地"尝试用GSON库替换play-json库,就像我在项目的另一小部分所做的那样.它在那里工作正常,但是即使我没有循环引用,即使我尝试反序列化一个手动创建的小对象,它也会抛出StackOverflowErrors.无论是在我的开发机上还是在EC2实例上.

I "simply" tried replacing the play-json library with the GSON library, as I did on another small part of the project. It worked fine there, but even though I have NO circular references, it throws StackOverflowErrors at my face, even if I try to deserialize a tiny manually created object. Both on my dev machine as well as on the EC2 instance.

FlexJson

List<myObject> myObjects = myObjectService.getInstance().getAllObjects(); // simplified example

JSONSerializer serializer = new JSONSerializer().prettyPrint(true);

return serializer.deepSerialize(myObjects); // returns a prettyPrinted String

到目前为止工作还不错,与上面的Json.toJson方法相比,它只花费大约20%的时间.但是,这可能是因为它并没有真正深度复制对象.它确实在第一层上进行了深度复制,但是由于我的模型具有一些更复杂的属性(包括子代和孙代以及孙子代...),并且其中有很多特性,因此我不确定如何在此处进行操作.

Worked quite okay so far, it takes only around 20% of the time compared to the Json.toJson method above. Which could be, however, because it doesn't REALLY deep copy the objects. It does deep copy it on the first layer, however since my model has some more complex properties (with childs and grandchilds and grandgrandchilds...), and quite a lot of them, I am unsure how to procede here.

这是我的一个嵌套对象(这是"upper"对象的属性之一)的示例输出:

Here is the example output of one of my nested objects (this is one of the properties of the "upper" object):

 "class": "com.avaje.ebean.common.BeanList",
                "empty": false,
                "filterMany": null,
                "finishedFetch": true,
                "loaderIndex": 0,
                "modifyAdditions": null,
                "modifyListenMode": "NONE",
                "modifyRemovals": null,
                "populated": true,
                "propertyName": "elements",
                "readOnly": false,
                "reference": false

您还有其他解决方案建议,或暗示可能会损坏的地方吗?我还考虑过,也许只有在我调用.toJson()之后才完全加载实体?仍然不应该花费这么多时间.

Do you have any other solution suggestions, or hints what might be broken? I was also thinking about that maybe the entities are only FULLY loaded once I call .toJson()? Still it shouldn't take such an amount of time.

提前谢谢!

推荐答案

TLDR:此问题与PlayFrameworks JSON反序列化性能无关,而与某些eBean/数据库问题无关.在application.conf中启用SQL日志记录使我注意到了这一点.

TLDR: this issue had nothing to do with PlayFrameworks JSON deserializing performance, rather than with some eBean / database issues. Enabling SQL logging in application.conf pointed me to this.

更多评论和想法: 多亏了marcospereira在评论中的暗示,我将问题归结为play/ebeans中的获取问题,而不是JSON性能问题.

Further remarks and thoughts: Thanks to the hint of marcospereira in the comments, I nailed the problem down to be a fetch issue within play / ebeans, rather than a JSON performance issue.

很显然,首先通过启用SQL日志记录来延迟加载我的实体(/flat),我可以看到只有在我的代码命中.toJson()之后,才触发正确的准备好的SELECT.如此多的子对象仅在调用.toJson()时才从数据库中获取,这导致数百个SELECT选择,因此需要相当长的时间才能完成.

Obviously my entities are loaded lazy (/flat) at first, by enabling SQL logging I could see that the correct prepared SELECTs are only fired once my code hits .toJson(). So many of the child objects are only fetched from the database when calling .toJson(), which results in a couple of hundred SELECTs and therefore quite some time to finish.

在RDS实例比例上玩一点,我发现了一些非常奇怪的结果.这与最初提出的问题并没有真正的关系,但是我想分享我的发现,也许对那里的人有帮助.在下面的部分中了解它.

Playing a bit with the RDS instance scales I found some very weird results. This isn't REALLY related to the question initially asked, yet I want to share my findings, maybe it can be of help for somebody out there. Read about it in the section below.

RDS缩放实验...

在我的开发环境(t1.micro)中,我将prod DB的复制实例连接到一个小型RDS实例(db.t2.micro)上,以查看是否有任何更改.

In my dev environment (t1.micro) I hooked up a copied instance of my prod DB on a small RDS instance (db.t2.micro), to see if anything changes.

我的产品环境(t2.large)+产品RDS(db.t2.large)花费了大约19.5s来完成API调用.在计算和数据库方面都较弱的新开发环境(t1.micro + db.t2.micro)仅花费了大约10.5s,这是非常不确定的,因为基本上两个实例都运行完全相同的代码,只是指向到另一个数据库服务器(具有相同的数据库内容).我将开发数据库切换到db.m4.large来查看是否带来了任何改进,并且加载时间下降到约5.5s.

My prod environment (t2.large) + prod RDS (db.t2.large) took around 19.5s to finish the API call. The NEW dev environment (t1.micro + db.t2.micro), which is weaker on both computing as well as db, took only about 10.5s, which is highly inconclusive, as basically both instances ran the very same code, only pointing to another DB server (with identical db content). I switched the dev DB to db.m4.large to see if that brought any improvement, and the load time went down to about 5.5s.

我完全困惑为什么更快的prod EC2实例需要比dev实例更多的时间来进行完全相同的API调用.最后,我将prod db类从db.t2.large更改为db.m4.large,现在的响应时间为4.0s.

I am completely puzzled why the faster prod EC2 instance would require more time for the exactly same API call than the dev instance. In the end I changed my prod db class from db.t2.large to db.m4.large and have a response time of 4.0s now.

感觉像旧"产品数据库实例有点破旧/阻塞(有这样的事情吗?我以某种方式对此表示怀疑...).即使是较小的dev实例+ dev db,响应速度也要快得多.尽管不同的RDS缩放比例带来了一些改进,但我怀疑db.t2.large-> db.m4.large之间的差异是否会导致该幅度发生变化.

Feels like the "old" prod DB instance was sort of worn-out/clogged (is there such a thing? I somehow doubt it...). Even the smaller dev instance + dev db responded much quicker. Even though the different RDS scalings brought some improvement, I doubt that the difference between db.t2.large -> db.m4.large would cause a change in that magnitude.

也许如果有人对正在发生的事情有一些想法,我将很乐意对此进行讨论.

Maybe if someone has some ideas what's going on, I would be very happy to discuss this.

这篇关于PlayFramework:糟糕的json反序列化性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆