网络抓取Instagram粉丝数BeautifulSoup [英] Webscraping Instagram follower count BeautifulSoup

查看:307
本文介绍了网络抓取Instagram粉丝数BeautifulSoup的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我才刚刚开始学习如何使用 BeautifulSoup 进行网页抓取,并希望编写一个简单的程序来获取给定Instagram页面的关注者数量.我目前有以下脚本(从另一个Q& A线程中提取):

I'm just starting to learn how to web scrape using BeautifulSoup and want to write a simple program that will get the follower count for a given Instagram page. I currently have the following script (pulled from another Q&A thread):

import requests
from bs4 import BeautifulSoup

user = "espn"
url = 'https://www.instagram.com/'+ user
r = requests.get(url)
soup = BeautifulSoup(r.content)
followers = soup.find('meta', {'name': 'description'})['content']
follower_count = followers.split('Followers')[0]
print(follower_count)

# 10.7m

我遇到的问题是我想获得一个更精确的数字,将鼠标悬停在Instagram页面上的关注者人数上时,可以看到该数字(例如10,770,816).

The problem I am running into is I want to get a more precise number, which you can see when you hover the mouse over the follower count on the Instagram page (e.g., 10,770,816).

不幸的是,我无法弄清楚如何使用BeautifulSoup做到这一点.我想在没有API的情况下执行此操作,因为我将其与代码结合在一起以跟踪其他社交媒体平台.有提示吗?

Unfortunately, I have not been able to figure out how to do this with BeautifulSoup. I'd like to do this without the API since I am combining this with code to track other social media platforms. Any tips?

推荐答案

使用API​​是最简单的方法,但是我也发现了一种非常骇人听闻的方法:

Use the API is the easiest way, but I also found a very hacky way to do it:

import requests

username = "espn"
url = 'https://www.instagram.com/' + username
r = requests.get(url).text

start = '"edge_followed_by":{"count":'
end = '},"followed_by_viewer"'
followers= r[r.find(start)+len(start):r.rfind(end)]

start = '"edge_follow":{"count":'
end = '},"follows_viewer"'
following= r[r.find(start)+len(start):r.rfind(end)]

print(followers, following)

如果您仔细查看给出的响应请求,则会出现一行包含实际关注者人数的Javascript:

If you look through the response requests gives, theres a line of Javascript that contains the real follower count:

... edge_followed_by":{"count":10770969},"followed_by_viewer":{ ...

所以我只是通过查找前后的子字符串来提取数字.

So I just extracted the number by finding the substring before and after.

这篇关于网络抓取Instagram粉丝数BeautifulSoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆