Spring + Hibernate:查询计划缓存的内存使用情况 [英] Spring + Hibernate: Query Plan Cache Memory usage

查看:119
本文介绍了Spring + Hibernate:查询计划缓存的内存使用情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用最新版本的Spring Boot编写应用程序。最近,我遇到了堆越来越大的问题,无法进行垃圾收集。使用Eclipse MAT对堆进行的分析显示,在运行应用程序的一小时内,堆增长到630MB,而Hibernate的SessionFactoryImpl则使用了整个堆的75%以上。

I'm programming an application with the latest version of Spring Boot. I recently became problems with growing heap, that can not be garbage collected. The analysis of the heap with Eclipse MAT showed that, within one hour of running the application, the heap grew to 630MB and with Hibernate's SessionFactoryImpl using more than 75% of the whole heap.

是寻找有关查询计划缓存的可能来源,但我发现的唯一发现是,但是没有播放。属性设置如下:

Is was looking for possible sources around the Query Plan Cache, but the only thing I found was this, but that did not play out. The properties were set like this:

spring.jpa.properties.hibernate.query.plan_cache_max_soft_references=1024
spring.jpa.properties.hibernate.query.plan_cache_max_strong_references=64

所有数据库查询均由Spring的查询生成使用诸如在本文档中。使用此技术生成了大约20个不同的查询。不使用其他本机SQL或HQL。
示例:

The database queries are all generated by the Spring's Query magic, using repository interfaces like in this documentation. There are about 20 different queries generated with this technique. No other native SQL or HQL are used. Sample:

@Transactional
public interface TrendingTopicRepository extends JpaRepository<TrendingTopic, Integer> {
    List<TrendingTopic> findByNameAndSource(String name, String source);
    List<TrendingTopic> findByDateBetween(Date dateStart, Date dateEnd);
    Long countByDateBetweenAndName(Date dateStart, Date dateEnd, String name);
}

List<SomeObject> findByNameAndUrlIn(String name, Collection<String> urls);

以IN用法为例。

问题是:为什么查询计划缓存会不断增长(它不会停止,而是以一个完整的堆结束),如何防止这种情况?有人遇到过类似的问题吗?

Question is: Why does the query plan cache keep growing (it does not stop, it ends in a full heap) and how to prevent this? Did anyone encounter a similar problem?

版本:


  • Spring Boot 1.2。 5

  • 休眠4.3.10

推荐答案

I也遇到了这个问题。基本上可以归结为在IN子句中使用可变数量的值,然后Hibernate尝试缓存那些查询计划。

I've hit this issue as well. It basically boils down to having variable number of values in your IN clause and Hibernate trying to cache those query plans.

关于该主题有两篇很棒的博客文章。
第一个

There are two great blog posts on this topic. The first:


在带有子句内查询
的项目中使用Hibernate 4.2和MySQL,例如: select t from t.id在(?)

Hibernate缓存这些已解析的HQL查询的地方。尤其是Hibernate
SessionFactoryImpl 具有 QueryPlanCache queryPlanCache
parameterMetadataCache 。但是,当子句中的参数
很大且变化时,这被证明是一个问题。

Hibernate caches these parsed HQL queries. Specifically the Hibernate SessionFactoryImpl has QueryPlanCache with queryPlanCache and parameterMetadataCache. But this proved to be a problem when the number of parameters for the in-clause is large and varies.

这些缓存针对每个不同的查询而增长。因此,此带有6000个
参数的查询与6001不同。

These caches grow for every distinct query. So this query with 6000 parameters is not the same as 6001.

该子句内查询扩展为
中的参数数量采集。查询计划中针对查询中的每个参数
都包含元数据,包括生成的名称,如x10_,x11_等。

The in-clause query is expanded to the number of parameters in the collection. Metadata is included in the query plan for each parameter in the query, including a generated name like x10_, x11_ , etc.

想象一下4000种不同的子句内参数
的数量,每个参数平均有4000个参数。每个参数的查询
元数据很快就会在内存中累加,填满
堆,因为无法对其进行垃圾收集。

Imagine 4000 different variations in the number of in-clause parameter counts, each of these with an average of 4000 parameters. The query metadata for each parameter quickly adds up in memory, filling up the heap, since it can't be garbage collected.

此继续直到查询参数
count中的所有不同变化都被缓存,或者JVM耗尽了堆内存并开始抛出
java.lang.OutOfMemoryError:Java堆空间。

This continues until all different variations in the query parameter count is cached or the JVM runs out of heap memory and starts throwing java.lang.OutOfMemoryError: Java heap space.

避免使用子句,以及使用固定集合
作为参数(或至少使用较小的大小)是一种选择。

Avoiding in-clauses is an option, as well as using a fixed collection size for the parameter (or at least a smaller size).

要配置查询计划缓存的最大大小,请参阅属性
hibernate.query.plan_cache_max_size ,默认值为 2048 (对于具有许多参数的查询来说,
太容易了)。

For configuring the query plan cache max size, see the property hibernate.query.plan_cache_max_size, defaulting to 2048 (easily too large for queries with many parameters).

第二(也从第一个引用):

And second (also referenced from the first):


内部休眠使用缓存,它将HQL语句(作为
字符串)映射到查询计划。缓存由一个有界映射表组成,该映射表默认将
限制为2048个元素(可配置)。所有HQL查询都通过此缓存加载
。万一未命中,该条目将自动添加到高速缓存中的
。这使得它非常容易受到重击-在
的情况下,我们不断将新条目放入缓存,而没有
重复使用它们,从而阻止了缓存带来任何
的性能提升(它甚至添加了一些缓存管理开销)。为了使
变得更糟,很难偶然发现这种情况-您
必须显式地描述高速缓存,以便注意到那里有
个问题。我会说几句话,以后再怎么做

Hibernate internally uses a cache that maps HQL statements (as strings) to query plans. The cache consists of a bounded map limited by default to 2048 elements (configurable). All HQL queries are loaded through this cache. In case of a miss, the entry is automatically added to the cache. This makes it very susceptible to thrashing - a scenario in which we constantly put new entries into the cache without ever reusing them and thus preventing the cache from bringing any performance gains (it even adds some cache management overhead). To make things worse, it is hard to detect this situation by chance - you have to explicitly profile the cache in order to notice that you have a problem there. I will say a few words on how this could be done later on.

因此,以
高生成新查询的结果导致缓存崩溃费率。这可能是由许多问题引起的。我见过的两个最常见的
是-休眠中的错误,这些错误会导致参数
在JPQL语句中呈现,而不是作为
参数传递和使用 in-条款。

So the cache thrashing results from new queries being generated at high rates. This can be caused by a multitude of issues. The two most common that I have seen are - bugs in hibernate which cause parameters to be rendered in the JPQL statement instead of being passed as parameters and the use of an "in" - clause.

由于休眠中一些晦涩的错误,在某些情况下,
参数未正确处理并呈现到JPQL
查询中(例如示例请查看 HHH-6280 )。如果您的查询受到此类缺陷影响的
且执行速度很高,则
会打乱您的查询计划缓存,因为生成的每个JPQL查询几乎都是唯一的
(包含ID为例如您的实体)。

Due to some obscure bugs in hibernate, there are situations when parameters are not handled correctly and are rendered into the JPQL query (as an example check out HHH-6280). If you have a query that is affected by such defects and it is executed at high rates, it will thrash your query plan cache because each JPQL query generated is almost unique (containing IDs of your entities for example).

第二个问题在于休眠方式使用$ in $子句处理
的查询(例如,给我所有公司ID为
的人实体)字段是1、2、10、18之一)。对于在子句中每个不同数量的参数
,hibernate都会产生不同的查询-例如
从Person x中选择x,其中(.id0 _)中的x.company.id为1个参数,
从x中选择x人员x,其中x.company.id在(:id0_,:id1 _)中有2个
参数,依此类推。所有这些查询都被认为是不同的,因为就查询计划缓存而言,
会再次导致缓存
抖动。您可能可以通过编写
实用程序类来仅生成一定数量的参数来解决此问题,例如1,
10、100、200、500、1000。例如,如果您传递22个参数,则
将返回100个元素的列表,其中
中包含22个参数,并且其余的78个参数设置为不可能的值(例如,用于外键的ID为-1
)。我同意这是一个丑陋的骇客,但
可以完成工作。这样一来,您的缓存中最多只能有6个
个唯一查询,从而减少了抖动。

The second issue lays in the way that hibernate processes queries with an "in" clause (e.g. give me all person entities whose company id field is one of 1, 2, 10, 18). For each distinct number of parameters in the "in"-clause, hibernate will produce a different query - e.g. select x from Person x where x.company.id in (:id0_) for 1 parameter, select x from Person x where x.company.id in (:id0_, :id1_) for 2 parameters and so on. All these queries are considered different, as far as the query plan cache is concerned, resulting again in cache thrashing. You could probably work around this issue by writing a utility class to produce only certain number of parameters - e.g. 1, 10, 100, 200, 500, 1000. If you, for example, pass 22 parameters, it will return a list of 100 elements with the 22 parameters included in it and the remaining 78 parameters set to an impossible value (e.g. -1 for IDs used for foreign keys). I agree that this is an ugly hack but could get the job done. As a result you will only have at most 6 unique queries in your cache and thus reduce thrashing.

那么您如何发现问题所在?您可以编写一些
附加代码,并使用
缓存中的条目数来公开指标,例如通过JMX,调整日志记录并分析日志等。如果您执行
不想(或无法)修改应用程序,则可以仅转储
堆并对其运行OQL查询(例如使用 mat ):从INSTANCEOF org.hibernate中选择l.query.toString()。 engine.query.spi.QueryPlanCache $ HQLQueryPlanKey l
会在您的堆
上输出当前位于任何查询计划缓存中的所有查询。应该很容易发现您是否受到上述任何问题的影响。

So how do you find out that you have the issue? You could write some additional code and expose metrics with the number of entries in the cache e.g. over JMX, tune logging and analyze the logs, etc. If you do not want to (or can not) modify the application, you could just dump the heap and run this OQL query against it (e.g. using mat): SELECT l.query.toString() FROM INSTANCEOF org.hibernate.engine.query.spi.QueryPlanCache$HQLQueryPlanKey l. It will output all queries currently located in any query plan cache on your heap. It should be pretty easy to spot whether you are affected by any of the aforementioned problems.

就性能影响而言,很难说因为它取决于太多因素。我已经看到一个非常琐碎的查询,在创建新的HQL查询计划时会导致10-20毫秒
的开销。通常,如果
在某处有缓存,则必须有充分的理由-
的丢失可能会很昂贵,因此您应尽量避免错过
。最后但并非最不重要的一点是,您的数据库也必须处理
大量的唯一SQL语句-导致它解析它们
并可能为它们中的每一个创建不同的执行计划。

As far as the performance impact goes, it is hard to say as it depends on too many factors. I have seen a very trivial query causing 10-20 ms of overhead spent in creating a new HQL query plan. In general, if there is a cache somewhere, there must be a good reason for that - a miss is probably expensive so your should try to avoid misses as much as possible. Last but not least, your database will have to handle large amounts of unique SQL statements too - causing it to parse them and maybe create different execution plans for every one of them.

这篇关于Spring + Hibernate:查询计划缓存的内存使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆