如何从网页中提取源HTML? [英] How to extract source html from webpage?

查看:141
本文介绍了如何从网页中提取源HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试提取此页面的html源, http://www .fxstreet.com/rates-charts/currency-rates/

I am trying to extract the html source of this page, http://www.fxstreet.com/rates-charts/currency-rates/

我想要从chrome中将页面另存为.html文件时看到的内容.

I want what I see when I save the page from chrome as a .html file.

我试图在Java中使用bufferedreader,然后使用jsoup来做到这一点.我也尝试在python中执行此操作,但是我不断收到以下消息:

I tried to do this in java, using bufferedreader, and then using jsoup. I also tried to do it in python, however I keep getting the following message:

该网站需要启用JavaScript和Cookies.请更改您的浏览器设置或升级您的浏览器."

"This site requires JavaScript and Cookies to be enabled. Please change your browser settings or upgrade your browser."

最终目标是提取主表中的值.

The end goal is to extract the values in the main table.

推荐答案

尝试使用 HtmlUnit 并设置setJavascriptEnabled(true)

另请参阅: JSoup不是执行Javascript的浏览器,因此必须选择其他库来获取页面,然后可以使用JSoup对其进行解析.

JSoup isn't headless browser to execute Javascript so you must choose other library to get the page and then you can use JSoup to parse it.

这篇关于如何从网页中提取源HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆