使用python从youtube上抓取视频信息 [英] Web scraping video information from youtube using python

查看:63
本文介绍了使用python从youtube上抓取视频信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 python 提取某个 Youtube 视频的视频信息(如标题、观众人数),就像我在其他网站上进行网页抓取一样.但出于某种原因,它要么不返回任何内容,要么仅提供侧面推荐视频的标签,而不是主视频".网址

I want to extract video information(like title, viewer's counts) of a certain Youtube video using python, just as I did web scraping on other websites. But for some reason, either it returns nothing or provides tags only for recommended videos on the side instead of "the main video" of the URL

我尝试了在其他网站上用于网页抓取的相同代码,如下所示.显然它在 Youtube 上不起作用.如果我想根据youtube URL获取视频信息怎么办?

I tried the same codes that I used for web-scraping on other websites as below. Apparently it doesn't work on Youtube. What should I do if I want to get video information based on a youtube URL?

import requests
from bs4 import BeautifulSoup

base_url ='https://www.youtube.com/watch?'
search_string = 'v=I41aLSzLI50'
url = base_url + search_string
supers=requests.get(url).content    
data = BeautifulSoup(supers,'html.parser')
videos =data.find_all('a', class_= 'content-link spf-link yt-uix-sessionlink spf-link')
for video in videos:
    print(video.find('span', class_='title').get_text())

推荐答案

我在 YouTube 上查找了一个页面,似乎您要查找的内容不在原始来源中(至少不在您期望的位置).当您的浏览器呈现页面时,有一些脚本会创建内容.根据我的经验,您有几个选择.

I looked up a page on YouTube, and it seems that the you are looking for is not in the original source (at least not where you are expecting it). There are scripts that create the content when your browser renders the page. Based on my experience, you have a few options.

  1. 使用评论者建议的 API 之一.我对这些不是很熟悉,但它可能会让你付出一些时间和精力.由于页面格式的变化(可能需要更新脚本),网页抓取可能会出现问题.

  1. Use one of the APIs the commenters suggested. I am not very familiar with these, but it might same you some time and effort. Web scraping can be problematic because of changes in page format (scripts may need to be updated).

如果您坚持要抓取网页,则可以使用自动浏览器.我曾经定期使用 Selenium,它应该适合您的目的.这将允许您处理由脚本生成的内容.

If you insist on web scraping, you can use an automated browser. I used to use Selenium on a regular basis and it should work for your purposes. This will allow you to work with content generated by scripts.

我查看了页面源代码,您要查找的信息似乎包含在某些标签中,但解析这些信息会很痛苦.

I looked at the page source, and the information you are looking for appears to be contained within some tags, but parsing this will be a pain.

这篇关于使用python从youtube上抓取视频信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆