Python请求没有给我与浏览器相同的HTML [英] Python requests isn't giving me the same HTML as my browser is

查看：89 发布时间：2020/9/23 22:22:44 python browser python-requests

本文介绍了Python请求没有给我与浏览器相同的HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Python请求获取Wikia页面。但是，这里存在一个问题：请求请求给我的HTML与浏览器具有相同页面的HTML 不同。

I am grabbing a Wikia page using Python requests. There's a problem, though: the requests request isn't giving me the same HTML as my browser is with the very same page.

比较，这是Firefox引导我的页面，并且< a href = https://www.dropbox.com/s/gwnqtmrkr5zxgmn/yokai-pythonrequests.html?dl=0 rel = noreferrer>这是请求获取页面的页面（下载它们以查看-抱歉，没有一种简单的方法可以直观地从另一个站点托管一些HTML）。

For comparison, here's the page Firefox gets me, and here's the page requests fetches (download them to view - sorry, no easy way to just visually host a bit of HTML from another site).

您会注意到一些区别（超级不友好的差异）。有一些小东西，例如属性beinig具有不同的顺序等，但是也有一些非常非常大的东西。最重要的是最后六个< img> 的缺失，以及导航和页脚部分的全部。即使在原始HTML中，页面也似乎突然中断。

You'll note a few differences (super unfriendly diff). There are some small things, like attributes beinig ordered differently and such, but there are also a few very, very large things. Most important is the lack of the last six <img>s, and the entirety of the navigation and footer sections. Even in the raw HTML it looks like the page cut off abruptly.

为什么会发生这种情况，有没有办法解决？我已经想到了很多事情，没有一件事情能取得成果：

Why is this happening, and is there a way to fix it? I've thought of a bunch of things already, none of which have been fruitful:

请求标头会干扰吗？是的，我尝试将浏览器发送的标头 User-Agent 以及所有1：1复制到请求请求中，但没有任何变化。

JavaScript是否在HTML加载后加载内容？没事即使禁用了JS，Firefox也给了我一个好页面。

嗯...嗯...还有什么呢？

Request headers interfering? Nope, I tried copying the headers my browser sends, User-Agent and all, 1:1 into the requests request, but nothing changed.
JavaScript loading content after the HTML is loaded? Nah. Even with JS disabled, Firefox gives me the "good" page.
Uh... well... what else could there be?

如果您知道这种情况的发生方式和修复方式，那就太了不起了。谢谢！

It'd be amazing if you know a way this could happen and a way to fix it. Thank you!

推荐答案

我遇到了类似的问题：

使用Python并通过浏览器使用相同的标头

JavaScript绝对被排除为原因

要解决此问题，我最终将 requests 库换成了 urllib.request 。

To resolve the issue, I ended up swapping out the requests library for urllib.request.

基本上，我替换为：

import requests

session = requests.Session()
r = session.get(URL)

with：

import urllib.request

r = urllib.request.urlopen(URL)

然后起作用。

也许其中一个库在幕后做了一些奇怪的事情？不知道这是否适合您。

Maybe one of those libraries is doing something strange behind the scenes? Not sure if that's an option for you or not.

这篇关于Python请求没有给我与浏览器相同的HTML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python请求没有给我与浏览器相同的HTML [英] Python requests isn't giving me the same HTML as my browser is

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python请求没有给我与浏览器相同的HTML [英] Python requests isn&#39;t giving me the same HTML as my browser is

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Python请求没有给我与浏览器相同的HTML [英] Python requests isn't giving me the same HTML as my browser is

登录关闭