Http URLConnection等待内部请求 [英] Http URLConnection wait for inner request

查看:233
本文介绍了Http URLConnection等待内部请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在抓取一个项目.当我与网站进行简单的URLConnection连接时,如下所示:

I am working on a crawling project. When I do a simple URLConnection connection to the website as shown in below:

URLConnection conn =新的URL(url).openConnection(); BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));

URLConnection conn = new URL(url).openConnection(); BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));

该方法正确返回HTML正文.但是,该网站对某些字段提出了内部要求.例如,网站从不同的Web服务获取用户总数.在Web浏览器中,一段时间后会出现用户总数,但是使用URLConnection方法不会等待用户总数,并且返回的HTML不包含该字段.

The method returns the HTML body correctly. However, the website makes inner requests for some fields. For example, the website fetches the total number of users from a different web service. In the web browser, the total number of users appear after some time, but with the URLConnection method does not wait for the total number of users and the returned HTML does not contain such field.

在Java中,有什么方法需要等待一段时间才能使用URLConnection从网站中获取所有数据.

In Java, is there any way to wait for a while to fetch all the data from a website using URLConnection.

推荐答案

从您的内部请求"评论看来,网站正在使用JavaScript(通过框架或仅使用本机浏览器API)来获取数据并呈现这些结果放入 DOM .如今,这在 SPA

From your "inner requests" comment it sounds like the website is using JavaScript (via a framework or just using native browser APIs) to fetch data and render these results into the DOM. This is very common nowadays with SPAs etc.

在这种情况下,使用简单的HTTP库(如URLConnection)不会产生任何等待量的结果-但是您可以通过在本地保存HTML并在浏览器中查看HTML来进行检查-会发生什么?当您检查它时,该页面上有JavaScript吗?

If that's the case, no amount of waiting will change the outcome from using a simple HTTP library like URLConnection - but you can check this by saving the HTML locally and viewing it in your browser - what happens? When you examine it, is there JavaScript on that page?

要在代码中正确执行此操作,您将需要具有类似于浏览器的功能,并能够在类似DOM的环境中执行HTML引用的JS.尝试 PhantomJS 或无头的Chrome/Firefox,或者 GhostDriver .

To do this properly in code, you'll need something capable of behaving more like a browser, and executing that JS referenced by the HTML in a DOM-like environment. Try Selenium with PhantomJS or headless Chrome / Firefox, or maybe GhostDriver.

这篇关于Http URLConnection等待内部请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆