BeautifulSoup无法读取请求获得的“完整" HTML [英] BeautifulSoup does not read 'full' HTML obtained by requests

查看：80 发布时间：2021/4/15 19:14:26 html python-3.x web-scraping beautifulsoup

本文介绍了BeautifulSoup无法读取请求获得的“完整" HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用BeautifulSoup从呈现为HTML的网站上抓取URL并请求库.我都在Python 3.5上运行它们.看来我已成功从请求中获取HTML，因为当我显示r.content时，会显示我要抓取的网站的完整HTML.但是，当我将其传递给BeautifulSoup时，BeautifulSoup会删除HTML的大部分内容，包括我要抓取的URL.

I am trying to scrape URL's from a website presented as HTML using the BeautifulSoup and requests libraries. I am running both of them on Python 3.5. It seems I am succesfully getting the HTML from requests because when I display r.content, the full HTML of the website I am trying to scrape is displayed. However, when I pass this to BeautifulSoup, BeautifulSoup drops the bulk of the HTML, including the URL I am trying to scrape.

from bs4 import BeautifulSoup
import requests

page = requests.get('www.example.com')
soup = BeautifulSoup(page.content, 'html.parser')

print(soup.findAll('div'))

我已经尝试使用html5lib，lxml等其他解析器，但未成功.

I already tried using other parsers like html5lib, lxml already without any success.

但是，输出结果并未显示网站HTML代码中的所有"div".

However, the output does not show all the 'div' that are actually on the website's HTML code.

这是我要从"h1.post-title"中抓取网址.

I want to scrape the URL from 'h1.post-title'.

BeautifulSoup无法读取请求获得的“完整" HTML [英] BeautifulSoup does not read 'full' HTML obtained by requests

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

BeautifulSoup无法读取请求获得的“完整" HTML [英] BeautifulSoup does not read &#39;full&#39; HTML obtained by requests

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

BeautifulSoup无法读取请求获得的“完整" HTML [英] BeautifulSoup does not read 'full' HTML obtained by requests

登录关闭