LinkedIn抓不到全部数据 [英] LinkedIn scraping not getting all data

查看:71
本文介绍了LinkedIn抓不到全部数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从类似这样的linkedin站点上:

From a linkedin site like: https://www.linkedin.com/company/10073529?trk=tyah&trkInfo=clickedVertical%3Acompany%2CclickedEntityId%3A10073529%2Cidx%3A1-1-1%2CtarId%3A1461132316737%2Ctas%3Adastrong%20

我正在尝试

与data-li-miniprofile-id相关的链接

the link associated with data-li-miniprofile-id

一个class ="new-miniprofile-container" href ="..." data-li-url ="..." data-li-miniprofile-id ="...>

具有,,,,等等的父项...

a class="new-miniprofile-container" href="..." data-li-url="..." data-li-miniprofile-id="...>

which has parents of , under , under , etc...

这是到目前为止我的代码:

This is what my code looks thus far:

import requests
from bs4 import beautifulsoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
for link in soup.find_all("a"):
    print(link.get('href'))

我最初只是寻找一个class ="new-miniprofile-container",但它返回了一个空数组.我认为原因是当我运行soup.prettify()(返回所有html抓取的数据)时,它只是不包含

I initially just looked for a class="new-miniprofile-container" but it returned an empty array. I think the reason is that when I ran soup.prettify() (which returns all of the html scraped data), it just doesn't contain any children content after

我认为问题与LinkedIn工程师设置的安全块有关,但我想知道是否有办法获取这些URL,或者是否还有其他选择可以获取这些URL.

I feel the problem is associated with the security blocks set up by LinkedIn engineers, but I want to know if there is a way to get those URLs, or if there are any other options to get those.

推荐答案

您应该使用 LinkedIn改为使用REST API .有与公司资料相关的相关端点,您可以在此处进行试验.还有一个 python-linkedin 客户端,该客户端还具有<记录了href ="https://github.com/ozgur/python-linkedin#company-api" rel ="nofollow">公司API 部分.

You should be using the LinkedIn REST API instead. There are the relevant company profile related endpoints and you can experiment with the REST API explorer here. And there is a python-linkedin client, which also has the Company API part documented.

这篇关于LinkedIn抓不到全部数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆