从Java中的网页拉取HTML [英] Pulling HTML from a Webpage in Java

查看：255 发布时间：2018/6/26 11:32:32 java python html webpage pull

本文介绍了从Java中的网页拉取HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从Java中的网站（或Python或PHP，如果这些语言更容易显示）拉出整个HTML源代码文件。我只希望查看HTML并通过几种方法扫描它 - 不以任何方式编辑或操作它，并且我真的希望我不会将它写入新文件，除非没有其他方式。有没有这样的库类或方法？如果没有，有什么办法可以解决这个问题吗？

I want to pull the entire HTML source code file from a website in Java (or Python or PHP if it is easier in those languages to display). I wish only to view the HTML and scan through it with a few methods- not edit or manipulate it in any way, and I really wish that I do not write it to a new file unless there is no other way. Are there any library classes or methods that do this? If not, is there any way of going about this?

推荐答案

在Java中：

URL url = new URL("http://stackoverflow.com");
URLConnection connection = new URLConnection(url);
InputStream stream = url.openConnection();
// ... read stream like any file stream

此代码适用于脚本目的和内部使用。虽然我会反对将它用于生产用途。它不处理超时和失败的连接。

This code, is good for scripting purposes and internal use. I would argue against using it for production use though. It doesn't handle timeouts and failed connections.

我推荐使用 HttpClient库供生产使用。它支持身份验证，重定向处理，线程，池化等。

I would recommend using HttpClient library for production use. It supports authentication, redirect handling, threading, pooling, etc.

这篇关于从Java中的网页拉取HTML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从Java中的网页拉取HTML [英] Pulling HTML from a Webpage in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

从Java中的网页拉取HTML [英] Pulling HTML from a Webpage in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭