为什么requests.get（）使用Python比浏览器检索不同的HTML？ [英] Why is requests.get() retrieving different HTML using Python than browser?

查看：512 发布时间：2018/6/20 16:01:25 javascript python html web-scraping

本文介绍了为什么requests.get（）使用Python比浏览器检索不同的HTML？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图从HTML表格中提取数据，但是看起来在使用 requests.get（）时HTML没有正确加载。相反，源文件中的一行代码如下所示：

JavaScript未启用，因此此页可能无法正常运行。

当我导航到谷歌浏览器中的页面时，HTML应该显示为它。

如何获得一个Python脚本来加载正确的HTML？

解决方案

欢迎来到奇妙的网络爬行世界。您遇到的问题是 requests.get（）只会让您获得浏览器在页面加载开始时接收到的初始页面。但是，这不是你在浏览器中看到的页面，因为可能涉及到很多形成网页：JavaScript函数调用，AJAX调用等。

如果你想要以编程方式获取在页面加载后单击Web浏览器中的显示源时看到的HTML - 您需要一个真正的浏览器。这是 selenium 可能是一个不错的选择：
来自selenium import webdriver browser = webdriver.Firefox（） browser.get（url） print browser.page_source
请注意 selenium 本身在定位元素方面非常强大 - 您不需要单独的HTML解析器将数据从页面中提取出来。

希望有帮助。

I am attempting to extract data from an HTML table, but it appears that the HTML isn't loading correctly when using requests.get(). Instead, a line in the source reads:

"JavaScript is not enabled and therefore this page may not function correctly."

When I navigate to the page in Google Chrome, the HTML appears as it should.

How do I get a Python script to load the proper HTML?
解决方案
Welcome to the wonderful world of web-crawling. The problem you are experiencing is that requests.get() would just get you the initial page that the browser receives at the beginning of a page load. But, this is not the page you see in the browser since there could be so much involved in forming the web page: javascript function calls, AJAX calls etc.

If you want to programmatically get the HTML you see when you click "Show source" in a web browser after the page was loaded - you would need a real browser. This is there selenium could be a good option:
from selenium import webdriver browser = webdriver.Firefox() browser.get(url) print browser.page_source
Note that selenium itself is very powerful in terms of locating elements - you don't need a separate HTML parser for extracting the data out of the page.

Hope that helps.

这篇关于为什么requests.get（）使用Python比浏览器检索不同的HTML？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么requests.get（）使用Python比浏览器检索不同的HTML？ [英] Why is requests.get() retrieving different HTML using Python than browser?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

为什么requests.get（）使用Python比浏览器检索不同的HTML？ [英] Why is requests.get() retrieving different HTML using Python than browser?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭