tomcat中的Java Web应用程序会定期冻结 [英] Java web app in tomcat periodically freezes up

查看:115
本文介绍了tomcat中的Java Web应用程序会定期冻结的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的运行Tomcat(7.0.28)的Java Web应用程序会定期变得无响应.我希望对可能的罪魁祸首(同步?)提出一些建议,以及一些建议的工具,以收集有关崩溃期间发生的事情的更多信息.我积累的一些事实:

My Java web app running Tomcat (7.0.28) periodically becomes unresponsive. I'm hoping for some suggestions of possible culprits (synchronization?), as well as maybe some recommended tools for gathering more information about what's occurring during a crash. Some facts that I have accumulated:

  • 当Web应用程序冻结时,tomcat继续将请求线程送入应用程序,但该应用程序不会释放它们.线程池已满(当前为250),然后随后的请求立即失败.在正常操作期间,活动线程永远不会超过2或3.

  • When the web app freezes up, tomcat continues to feed request threads into the app, but the app does not release them. The thread pool fills up to the maximum (currently 250), and then subsequent requests immediately fail. During normal operation, there is never more than 2 or 3 active threads.

发生问题时,不会在我们的任何tomcat或Web应用程序日志中记录任何类型的错误或异常.

There are no errors or exceptions of any kind logged to any of our tomcat or web app logs when the problem occurs.

通过tomcat管理Web应用程序在我们的应用程序上执行停止",然后执行开始",将立即解决此问题(直到今天).

Doing a "Stop" and then a "Start" on our application via the tomcat management web app immediately fixes this problem (until today).

最近的频率是一天两次或三次,尽管今天情况更糟,可能是20次,有时有时无法立即恢复.

Lately the frequency has been two or three times a day, though today was much worse, probably 20 times, and sometimes not coming back to life immediately.

仅在工作时间出现问题

在我们的登台系统上不会出现问题

The problem does not occur on our staging system

出现问题时,服务器上的处理器和内存使用率保持不变(且相当低). Tomcat报告有大量可用内存.

When the problem occurs, processor and memory usage on the server remains flat (and fairly low). Tomcat reports plenty of free memory.

发生问题时,Tomcat继续保持响应.管理网络应用程序运行良好,并且tomcat继续向我们的应用程序发送请求,直到池中的所有线程都被填满.

Tomcat continues to be responsive when the problem occurs. The management web app works perfectly well, and tomcat continues sending requests into our app until all threads in the pool are filled.

发生问题时,我们的数据库服务器将保持响应状态.我们使用Spring框架进行数据访问和注入.

Our database server remains responsive when the problem occurs. We use Spring framework for data access and injection.

使用率很高时通常会出现问题,但是使用率绝不会出现异常高的峰值.

Problem generally occurs when usage is high, but there is never an unusually high spike in usage.

问题历史记录:大约一年半以前发生了类似的事情.在对许多服务器配置和代码进行更改之后,该问题消失了,直到大约一个月前.在过去的几周内,这种情况发生的频率更高,平均每天发生2到3次,有时甚至连续几次.

Problem history: something similar occurred about a year and a half ago. After many server config and code changes, the problem disappeared until about a month ago. Within the past few weeks it has occurred much more frequently, an average of 2 or 3 times a day, sometimes several times in a row.

我今天确定了一些服务器代码,这些服务器代码可能不是线程安全的,我对此进行了修复,但是问题仍然在发生(尽管不那么频繁).这是非线程安全代码可能导致的那种问题吗?

I identified some server code today that may not have been threadsafe, and I put a fix in for that, but the problem is still happening (though less frequently). Is this the sort of problem that un-threadsafe code can cause?

更新:有几篇文章建议数据库连接池用尽,我在该方向上进行了一些搜索,然后发现了其他

UPDATE: With several posts suggesting database connection pool exhaustion, I did some searching in that direction and found this other Stackoverflow question which explains almost all of the problems I'm experiencing.

显然,Apache的BasicDataSource实现中的maxActive和maxIdle连接的默认值分别为8.而且,maxWait设置为-1,因此,当池耗尽并且有新的连接请求进入时,它将永远等待无需记录任何异常.我仍将等待这个问题再次发生,并在JVM上执行jstack dump,以便我可以分析该信息,但是看起来这就是问题所在.它唯一无法解释的是为什么应用有时无法从该问题中恢复.我想这些请求有时会堆积起来,一旦它落在后面,就永远无法追上.

Apparently, the default values for maxActive and maxIdle connections in Apache's BasicDataSource implementation are each 8. Also, maxWait is set to -1, so when the pool is exhausted and a new request for a connection comes in, it will wait forever without logging any sort of exception. I'm still going to wait for this problem to happen again and perform a jstack dump on the JVM so that I can analyze that information, but it's looking like this is the problem. The only thing it doesn't explain is why the app sometimes doesn't recover from this problem. I suppose the requests just pile up sometimes and once it gets behind it can never catch up.

更新II::我在崩溃期间运行了一个jstack,发现其中约有250个(最大线程):

UPDATE II: I ran a jstack during a crash and found about 250 (max threads) of the following:

"http-nio-443-exec-294" daemon prio=10 tid=0x00002aaabd4ed800 nid=0x5a5d in Object.wait() [0x00000000579e2000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1118)
        - locked <0x0000000743116b30> (a org.apache.commons.pool.impl.GenericObjectPool$Latch)
        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
        at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
        at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
        at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:573)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:637)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:666)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:674)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:718)

在我未经训练的眼睛看来,这是非常确定的.看起来数据库连接池已达到顶峰.我配置了三秒的maxWait,而没有修改maxActive和maxIdle只是为了确保我们开始看到池满时记录的异常.然后,我将这些值设置为适当的值并进行监视.

To my untrained eye, this looks fairly conclusive. It looks like the database connection pool has hit its cap. I configured a maxWait of three seconds without modifying the maxActive and maxIdle just to ensure that we begin to see exceptions logged when the pool fills up. Then I'll set those values to something appropriate and monitor.

更新III:配置maxWait之后,我开始按预期在日志中看到这些内容:

UPDATE III: After configuring maxWait, I began to see these in my logs, as expected:

 org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
        at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
        at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
        at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)

我已经将maxActive设置为-1(无限),并将maxIdle设置为10.我将监视一会儿,但是我猜这是问题的结尾.

I've set maxActive to -1 (infinite) and maxIdle to 10. I will monitor for a while, but my guess is that this is the end of the problem.

推荐答案

根据经验,您可能希望查看您的数据库连接池实现.可能您的数据库具有足够的容量,但是应用程序中的连接池仅限于少数几个连接.我不记得详细信息,但我似乎还记得有一个类似的问题,这是我改用 BoneCP 的原因之一.我发现在负载测试下非常快速和可靠.

From experience, you may want to look at your database connection pool implementation. It could be that your database has plenty of capacity, but the connection pool in your application is limited to a small number of connections. I can't remember the details, but I seem to recall having a similar problem, which was one of the reasons I switched to using BoneCP, which I've found to be very fast and reliable under load tests.

尝试以下建议的调试后,尝试增加池中可用的连接数,看看是否有影响.

After trying the debugging suggested below, try increasing the number of connection available in the pool and see if that has any impact.

我今天确定了一些服务器代码,这些代码可能不是线程安全的, 我为此修复了一个问题,但问题仍然在发生 (尽管不那么频繁).这是那种问题吗 非线程安全的代码会导致什么?

I identified some server code today that may not have been threadsafe, and I put a fix in for that, but the problem is still happening (though less frequently). Is this the sort of problem that un-threadsafe code can cause?

这取决于线程安全的含义.在我看来,您的应用程序正在导致线程死锁.您可能希望使用配置为允许调试器连接的JVM运行生产环境,然后使用JVisualVM,JConsole或其他性能分析工具(YourKit是出色的IMO)来窥视所拥有的线程以及它们的作用.正在等待.

It depends what you mean by thread-safe. It sounds to me as though your application is causing threads to deadlock. You might want to run your production environment with the JVM configured to allow a debugger to connect, and then use JVisualVM, JConsole or another profiling tool (YourKit is excellent IMO) to have a peek at what threads you've got, and what they're waiting on.

这篇关于tomcat中的Java Web应用程序会定期冻结的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆