LinkedIn抓不到全部数据 [英] LinkedIn scraping not getting all data

查看：71 发布时间：2021/4/15 19:19:41 python html web-scraping beautifulsoup linkedin

本文介绍了LinkedIn抓不到全部数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

从类似这样的linkedin站点上:

From a linkedin site like: https://www.linkedin.com/company/10073529?trk=tyah&trkInfo=clickedVertical%3Acompany%2CclickedEntityId%3A10073529%2Cidx%3A1-1-1%2CtarId%3A1461132316737%2Ctas%3Adastrong%20

我正在尝试

与data-li-miniprofile-id相关的链接

the link associated with data-li-miniprofile-id

一个class ="new-miniprofile-container" href ="..." data-li-url ="..." data-li-miniprofile-id ="...>

具有，，，，等等的父项...

a class="new-miniprofile-container" href="..." data-li-url="..." data-li-miniprofile-id="...>

which has parents of , under , under , etc...

这是到目前为止我的代码:

This is what my code looks thus far:

import requests
from bs4 import beautifulsoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
for link in soup.find_all("a"):
    print(link.get('href'))

我最初只是寻找一个class ="new-miniprofile-container"，但它返回了一个空数组.我认为原因是当我运行soup.prettify()(返回所有html抓取的数据)时，它只是不包含

I initially just looked for a class="new-miniprofile-container" but it returned an empty array. I think the reason is that when I ran soup.prettify() (which returns all of the html scraped data), it just doesn't contain any children content after

我认为问题与LinkedIn工程师设置的安全块有关，但我想知道是否有办法获取这些URL，或者是否还有其他选择可以获取这些URL.

I feel the problem is associated with the security blocks set up by LinkedIn engineers, but I want to know if there is a way to get those URLs, or if there are any other options to get those.

LinkedIn抓不到全部数据 [英] LinkedIn scraping not getting all data

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

LinkedIn抓不到全部数据 [英] LinkedIn scraping not getting all data

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭