美丽的汤没有加载整个页面 [英] Beautiful Soup not loading the entire page

查看：0 发布时间：2022/8/2 15:28:29 python beautifulsoup web-crawler

本文介绍了美丽的汤没有加载整个页面的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个网络爬网脚本

import requests
from lxml import html
import bs4
res = requests.get('https://in.linkedin.com/in/ASAMPLEUSERNAME', headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'})

print(res.text)

请将代码中url中的‘ASAMPLEUSERNAME’替换为某个LinkedIn虚拟用户

但代码只给出了部分不完整(几乎为零)的网页源代码

推荐答案

正如Beng所说，内容是动态的。您可以通过查看源代码来了解这一点。那里的很多html都包含"脚本"。您可以使用另一个库(如Selify)加载动态元素。

编辑：从理论上讲，这就是使用Selify获取页面源代码的方法。在实践中，这似乎更难一些。我被重定向到LinkedIn的登录。但您可以扩展代码以登录，然后获得页面的源代码。如果你需要帮助，告诉我一声。请注意，要使此代码正常工作，您需要安装Chrome，并在路径中显示您的驱动程序。

from selenium import webdriver
driver = webdriver.Chrome(executable_path=yourdriver)
url ='https://in.linkedin.com/in/SOMEUSER'
driver.get(url)
html = driver.page_source

这篇关于美丽的汤没有加载整个页面的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

美丽的汤没有加载整个页面 [英] Beautiful Soup not loading the entire page

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

美丽的汤没有加载整个页面 [英] Beautiful Soup not loading the entire page

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭