使用python从单个youtube播放列表链接中提取单个链接 [英] Extract individual links from a single youtube playlist link using python
问题描述
我需要一个Python脚本,该脚本需要链接到单个youtube播放列表,然后给出一个包含播放列表中各个视频的链接的列表.
我意识到几年前曾问过同样的问题,但有人问它是否要使用python2.x,答案中的代码无法正常工作.它们很奇怪,有时可以工作,但偶尔会输出空(也许那里使用的某些软件包已经更新,我不知道).我在下面包含了其中的代码之一.
如果您不信任任何代码,请多次运行此代码,偶尔会收到一次空列表,但大多数情况下,它会分解播放列表.
来自bs4的 将BeautifulSoup导入为bs汇入要求r = request.get('https://www.youtube.com/playlist?list=PL3D7BFF1DDBDAAFE5')页面= r.textsoup = bs(page,'html.parser')res = soup.find_all('a',{'class':'pl-video-title-link'})对于res in:打印(l.get("href"))
对于某些播放列表,代码根本不起作用.
此外,如果beautifulsoup无法胜任这项工作,那么任何其他流行的python库也可以.
似乎youtube有时加载页面的不同版本,有时使用 pl-video-title-link 代码>类:
< td class ="pl-video-title">< a class ="pl-video-title-link yt-uix-tile-link yt-uix-sessionlink spf-link" dir ="ltr" href ="/watch?v = GtWXOzsD5Fw& amp; list = PL3D7BFF1DDBDAAFE5& amp;amp; index = 101& t = 0s"data-sessionlink =" ei = TJbjXtC8NYri0wWCxarQDQ& feature = plpp_video& ved = CGoQxjQYYyITCNCSmqHD_OkCFQrxtAodgqIK2ij6LA>Android应用程序开发教程-105-微调器和ArrayAdapter</a>< div class ="pl-video-owner">de< a href ="/user/thenewboston" class ="yt-uix-sessionlink spf-link" data-sessionlink ="ei = TJbjXtC8NYri0wWCxarQDQ& feature = playlist& ved = CGoQxjQYYyITCNCSmqHD_OkCFQgtxttt&t;< a> Jin< g< ija></div>< div class ="pl-video-bottom-standalone-badge"></div></td>
有时将数据嵌入JS变量中并动态加载:
window ["ytInitialData"] = {....这里有很大的json ....};
对于第二个版本,除非要使用硒之类的工具来在页面加载后获取内容,否则将需要使用正则表达式来解析Javascript.
IMO的最佳方法是使用官方API,该API很容易获得播放列表项:
- 转到
- 转到凭据/创建凭据/API密钥
-
为python安装google api客户端:
pip3 install-升级google-api-python-client
在下面的脚本中使用API密钥.此脚本获取ID为
PL3D7BFF1DDBDAAFE5
的播放列表的播放列表项,请使用输出:
从PL3D7BFF1DDBDAAFE5获取所有播放列表项链接总数:195['https://www.youtube.com/watch?v=SUOWNXGRc6g&list=PL3D7BFF1DDBDAAFE5&t=0s','https://www.youtube.com/watch?v=857zrsYZKGo&list=PL3D7BFF1DDBDAAFE5&t=0s','https://www.youtube.com/watch?v=Da1jlmwuW_w&list=PL3D7BFF1DDBDAAFE5&t=0s',......'https://www.youtube.com/watch?v=1j4prh3NAZE&list=PL3D7BFF1DDBDAAFE5&t=0s','https://www.youtube.com/watch?v=s9ryE6GwhmA&list=PL3D7BFF1DDBDAAFE5&t=0s']
I need a python script that takes link to a single youtube playlist and then gives out a list containing the links to individual videos in the playlist.
I realize that same question was asked few years ago, but it was asked for python2.x and the codes in the answer don't work properly. They are very weird, they work sometimes but give empty output once in a while(maybe some of the packages used there have been updated, I don't know). I've included one of those code below.
If any of you don't believe, run this code several times you'll receive empty list once in a while, but most of the time it does the job of breaking down a playlist.
from bs4 import BeautifulSoup as bs import requests r = requests.get('https://www.youtube.com/playlist?list=PL3D7BFF1DDBDAAFE5') page = r.text soup=bs(page,'html.parser') res=soup.find_all('a',{'class':'pl-video-title-link'}) for l in res: print(l.get("href"))
In case of some playlists the code just doesn't work at all.
Also, if beautifulsoup can't do the job, any other popular python library will do.
解决方案It seems youtube loads sometimes different versions of the page, sometimes with html organized like you expected using links with
pl-video-title-link
class :<td class="pl-video-title"> <a class="pl-video-title-link yt-uix-tile-link yt-uix-sessionlink spf-link " dir="ltr" href="/watch?v=GtWXOzsD5Fw&list=PL3D7BFF1DDBDAAFE5&index=101&t=0s" data-sessionlink="ei=TJbjXtC8NYri0wWCxarQDQ&feature=plpp_video&ved=CGoQxjQYYyITCNCSmqHD_OkCFQrxtAodgqIK2ij6LA"> Android Application Development Tutorial - 105 - Spinners and ArrayAdapter </a> <div class="pl-video-owner"> de <a href="/user/thenewboston" class=" yt-uix-sessionlink spf-link " data-sessionlink="ei=TJbjXtC8NYri0wWCxarQDQ&feature=playlist&ved=CGoQxjQYYyITCNCSmqHD_OkCFQrxtAodgqIK2ij6LA" >thenewboston</a> </div> <div class="pl-video-bottom-standalone-badge"> </div> </td>
Sometimes with data embedded in a JS variables and loaded dynamically :
window["ytInitialData"] = { .... very big json here .... };
For the second version, you will need to use regex to parse Javascript unless you want to use tools like selenium to grab the content after page load.
The best way IMO is to use the official API which is straightforward to get the playlist items :
- Go to Google Developer Console, search Youtube Data API / enable Youtube Data API v3
- Go to Credentials / Create Credentials / API key
install google api client for python :
pip3 install --upgrade google-api-python-client
Use the API key in the script below. This script fetch playlist items for playlist with id
PL3D7BFF1DDBDAAFE5
, use pagination to get all of them, and re-create the link from the videoId and playlistID :import googleapiclient.discovery from urllib.parse import parse_qs, urlparse #extract playlist id from url url = 'https://www.youtube.com/playlist?list=PL3D7BFF1DDBDAAFE5' query = parse_qs(urlparse(url).query, keep_blank_values=True) playlist_id = query["list"][0] print(f'get all playlist items links from {playlist_id}') youtube = googleapiclient.discovery.build("youtube", "v3", developerKey = "YOUR_API_KEY") request = youtube.playlistItems().list( part = "snippet", playlistId = playlist_id, maxResults = 50 ) response = request.execute() playlist_items = [] while request is not None: response = request.execute() playlist_items += response["items"] request = youtube.playlistItems().list_next(request, response) print(f"total: {len(playlist_items)}") print([ f'https://www.youtube.com/watch?v={t["snippet"]["resourceId"]["videoId"]}&list={playlist_id}&t=0s' for t in playlist_items ])
Output:
get all playlist items links from PL3D7BFF1DDBDAAFE5 total: 195 [ 'https://www.youtube.com/watch?v=SUOWNXGRc6g&list=PL3D7BFF1DDBDAAFE5&t=0s', 'https://www.youtube.com/watch?v=857zrsYZKGo&list=PL3D7BFF1DDBDAAFE5&t=0s', 'https://www.youtube.com/watch?v=Da1jlmwuW_w&list=PL3D7BFF1DDBDAAFE5&t=0s', ........... 'https://www.youtube.com/watch?v=1j4prh3NAZE&list=PL3D7BFF1DDBDAAFE5&t=0s', 'https://www.youtube.com/watch?v=s9ryE6GwhmA&list=PL3D7BFF1DDBDAAFE5&t=0s' ]
这篇关于使用python从单个youtube播放列表链接中提取单个链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!