从特定频道抓取YouTube视频并进行搜索? [英] scrape YouTube video from a specific channel and search?

查看:714
本文介绍了从特定频道抓取YouTube视频并进行搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用此代码来获取正常工作的youtube频道的网址,但我想添加一个选项来搜索频道中具有特定标题的视频.并找到包含搜索词组的第一个视频的网址

I am using this code to get the url of a youtube channel it works fine, but I would like to add an option to search for a video with a specific title within the channel. and get the url of the first video you find with the search phrase

from bs4 import BeautifulSoup
import requests

url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")

for entry in soup.find_all("entry"):
    for link in entry.find_all("link"):
        print(link["href"])

推荐答案

在我的最后一个答案中,您将获得所需给定youtube频道中的所有视频标题 但是在我们之间的评论中,您告诉我您想通过cronjob运行脚本,这需要花费更多的精力,所以我添加了另一个答案.

In my last answer, you get all the video titles in the given youtube channel, as what you looking for But in the comments between us, you tell me you wanna run the script via cronjob, it takes more effort, so I add another answer.

from bs4 import BeautifulSoup
from lxml import etree
import urllib
import requests
import sys

def fetch_titles(url):
    video_titles = []
    html = requests.get(url)
    soup = BeautifulSoup(html.text, "lxml")
    for entry in soup.find_all("entry"):
        for link in entry.find_all("link"):
            youtube = etree.HTML(urllib.request.urlopen(link["href"]).read()) 
            video_title = youtube.xpath("//span[@id='eow-title']/@title") 
            if len(video_title)>0:
                video_titles.append({"title":video_title[0], "url":link.attrs["href"]})
    return video_titles

def main():
    if sys.argv.__len__() == 1:
        print("Error: You should specifying keyword")
        print("eg: python3 ./main.py KEYWORD")
        return

    url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
    keyword = sys.argv[1]

    video_titles = fetch_titles(url)
    for video in video_titles:
        if video["title"].__contains__(keyword):
            print(video["url"])
            break # add this line, if you want to print the first match only


if __name__ == "__main__":
    main()

通过终端调用脚本时,应指定关键字,如下所示:

When you call the script via Terminal, you should specify the keyword, like this:

$ python3 ./main.py Mac

Mac是关键字,main.py是python脚本文件名

Which Mac is the keyword and main.py is the python script filename

输出:

https://www.youtube.com/watch?v=l_IHSRPVqwQ

这篇关于从特定频道抓取YouTube视频并进行搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆