如何从Python的Twitch上的特定频道获取所有链接或剪辑? [英] How would I get all links, or clips, from a specific channel on Twitch in Python?
本文介绍了如何从Python的Twitch上的特定频道获取所有链接或剪辑?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re
req = Request("https://www.twitch.tv/directory/game/League%20of%20Legends/clips")
html_page = urlopen(req)
soup = BeautifulSoup(html_page, "html.parser")
links = []
for link in soup.findAll('a'):
links.append(link.get('href'))
print(links)
这是我到目前为止的代码,我不太确定如何修改它以获取Twitch上的剪辑链接.
This is the code I have this far, I'm not too sure how I would modify it to get the clip links on Twitch.
推荐答案
URL是动态创建的,因此仅尝试加载HTML是不够的.通过查看浏览器发出的获取数据的请求,数据将在JSON对象中返回.
The URLs are created dynamically, so just trying to load the HTML will not be enough. By looking at the request a browser makes to get the data, it is returned inside a JSON object.
您可能需要使用selenium
之类的工具来使浏览器自动化以获取所有URL,或者,您也可以自己请求JSON,如下所示:
You would either need to use something like selenium
to automate a browser to get all the URLs or alternatively, requests the JSON yourself as follows:
import requests
url = "https://gql.twitch.tv/gql"
json_req = """[{"query":"query ClipsCards__Game($gameName: String!, $limit: Int, $cursor: Cursor, $criteria: GameClipsInput) { game(name: $gameName) { id clips(first: $limit, after: $cursor, criteria: $criteria) { pageInfo { hasNextPage __typename } edges { cursor node { id slug url embedURL title viewCount language curator { id login displayName __typename } game { id name boxArtURL(width: 52, height: 72) __typename } broadcaster { id login displayName __typename } thumbnailURL createdAt durationSeconds __typename } __typename } __typename } __typename } } ","variables":{"gameName":"League of Legends","limit":100,"criteria":{"languages":[],"filter":"LAST_DAY"},"cursor":"MjA="},"operationName":"ClipsCards__Game"}]"""
r = requests.post(url, data=json_req, headers={"client-id":"kimne78kx3ncx6brgo4mv6wki5h1ko"})
r_json = r.json()
edges = r_json[0]['data']['game']['clips']['edges']
urls = [edge['node']['url'] for edge in edges]
for url in urls:
print url
这将为您提供第一个100
网址,开头为:
This would give you the first 100
URLs starting as:
https://clips.twitch.tv/CourageousOnerousChoughWOOP
https://clips.twitch.tv/PhilanthropicAssiduousSwordHassaanChop
https://clips.twitch.tv/MistyThoughtfulLardPRChase
https://clips.twitch.tv/HotGoldenAmazonSSSsss
https://clips.twitch.tv/RelievedViscousPangolinOSsloth
这篇关于如何从Python的Twitch上的特定频道获取所有链接或剪辑?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文