使用 urllib2 执行 URL 并返回呈现的 HTML 输出，而不是 HTML 本身 [英] using urllib2 to execute URL and return rendered HTML output, not the HTML itself

查看：35 发布时间：2021/6/26 20:05:52 python python-2.7 urllib2 urllib

本文介绍了使用 urllib2 执行 URL 并返回呈现的 HTML 输出，而不是 HTML 本身的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

urllib2.urlopen("http://www.someURL.com/pageTracker.html").read();

以上代码将返回 http://www.google.com 上的源 HTML.

The code above will return the source HTML at http://www.google.com.

我需要做什么才能真正返回您在访问 google.com 时看到的呈现的 HTML?我基本上试图执行"一个 URL 来触发一个视图，而不是检索 HTML.

What do I need to do to actually return the rendered HTML that you see when you visit google.com? I essentially trying to 'execute' a URL to trigger a view, not retrieve the HTML.

澄清几点:

我实际上并不关心页面的视觉输出
我关心页面呈现，因为它会在适当的浏览器中呈现，以便我可以通过该页面上的 JavaScript 跟踪 Google Analytics(分析)目标.

推荐答案

由于 Google 主页在某种程度上依赖于 JavaScript，因此您无法使用简单的 HTTP 请求/HTML 解析库来渲染 HTML，因为它们不会在这页纸.只有 Web 浏览器会呈现 HTML，因此您需要一个浏览器来获取呈现的 HTML.

Because Google home page somewhat relies on JavaScript, you cannot get rendered HTML with a simple HTTP request / HTML parsing library, as these do not run the JavaScript enhancements on the page. Only web browsers render HTML, so you need a browser to get the rendered HTML.

您需要使用成熟的无头 Web 浏览器库，而不是简单的 HTTP 请求库.

Instead of simple HTTP request library, you need to use a full-blown headless web browser library.

一个可用的选项是 Selenium 及其 WebDriver.

One available option is Selenium and its WebDriver.

https://pypi.python.org/pypi/selenium

在 Selenium 中打开一个页面.有关示例，请参阅 PyPi.

Open a page in Selenium. See PyPi for the example.

使用 time.sleep() 等待一段时间以确保所有资源都已加载且基于 JavaScript 的 DOM 修改解决.延迟取决于网页，建议您尝试不同的值.

Wait some time with time.sleep() to make sure all resource are loaded and JavaScript-based DOM modifications settle. The delay depends on the web page, I suggest you experiement with different values.

您可以向 Selenium 驱动程序发出 JavaScript 命令以返回当前加载页面的 DOM 树:

You can issue a JavaScript command to the Selenium driver to return the DOM tree of currently loaded page:

driver.execute_script("return document.innerHTML")

这篇关于使用 urllib2 执行 URL 并返回呈现的 HTML 输出，而不是 HTML 本身的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 urllib2 执行 URL 并返回呈现的 HTML 输出，而不是 HTML 本身 [英] using urllib2 to execute URL and return rendered HTML output, not the HTML itself

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 urllib2 执行 URL 并返回呈现的 HTML 输出，而不是 HTML 本身 [英] using urllib2 to execute URL and return rendered HTML output, not the HTML itself

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭