Java socketRead0问题 [英] Java socketRead0 Issue

查看:1414
本文介绍了Java socketRead0问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用htmlunit开发一个web cralwer并且我已经添加了所有必需的超时但我注意到当我使用Java VisualVM进行线程转储时,当某个网站的服务器被爬网时,应用程序挂起时没有响应:

I'm developing a web cralwer with htmlunit and I have added all required timeout but I notice that the app hangs when the server of some website been crawled is not responding at when I use the Java VisualVM to do a thread dump:

java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:88)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:429)
at java.net.Socket.connect(Socket.java:525)
at com.gargoylesoftware.htmlunit.SocksSocketFactory.connectSocket(SocksSocketFactory.java:89)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:776)
at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:152)
at app.plugin.core.net.QHttpWebConnection.getResponse(QHttpWebConnection.java:30)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1439)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1358)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:307)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:373)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:358)

由于我无法控制这些服务器,这实在令人沮丧。此问题严重影响了我的申请表现。

This is really frustrating since I have no control of those servers. This issue is seriously affecting the performance of my application.

问题:


  1. 我该如何解决这个问题?

  2. 有没有办法获取Java应用程序打开的套接字连接列表并使用它来终止套接字,就像模拟服务器关闭连接一样?


推荐答案

我相信当你使用Java本机方法时,堆栈跟踪将会说RUNNABLE即使呼叫实际上被阻止等待某个事件。本质上,我不相信Java有任何方法可以知道本机方法实际上在做什么,所以它将这些调用标记为RUNNABLE。我已经看到了socketRead0()和socketAccept() - 两者都通常阻塞。

I believe that when you are in a Java native method, the stack trace will say RUNNABLE even if the call is actually blocked waiting for some event. In essence, I don't believe Java has any way of knowing what a native method is actually doing, so it flags these calls as RUNNABLE. I have seen this with socketRead0() and socketAccept() -- both of which typically block.

你需要将你的超时设置为合理的时间长度,这样你的如果服务器没有响应,请求将超时,但如果服务器只是忙,则请求不会太短。应该编写您的应用程序以使用多个线程。我会尝试运行十几个或更多线程,并让每个线程等待最多五或十秒钟的响应。让少数线程等待几乎没有开销。在编写网络蜘蛛时,您还应该注意不要轰炸服务器并提出大量请求。

You need to set your timeout to a reasonable length of time such that your request will time out if the server is not responding but not too short in case the server is simply busy. Your application should be written to use multiple threads. I would try running a dozen or more threads and have each thread wait up to five or ten seconds for a response. There is virtually no overhead in having a handful of threads waiting. You should also be mindful of not bombarding a server with lots of requests when writing a web spider.

这篇关于Java socketRead0问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆