从网页中抓取 YouTube 链接 [英] Scraping YouTube links from a webpage

查看:50
本文介绍了从网页中抓取 YouTube 链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直试图从网页上抓取 YouTube 链接,但没有任何效果.这是我一直在尝试抓取的内容的图片.:

I've been trying to scrape YouTube links from a webpage, but nothing has worked. This is a picture of what I've been trying to scrape.:

这是我最近尝试的代码:

This is the code I tried most recently:

youtube_link = soup.find("a", class_="ytp-title-link yt-uix-sessionlink")

这是 YouTube 链接所在网站的链接:https://www.electronic-festivals.com/event/i-am-hardstyle-germany

And this is the link to the website the YouTube link is in: https://www.electronic-festivals.com/event/i-am-hardstyle-germany

推荐答案

大多数 youtube 链接都在 iframe 中,并且 javascript 也需要运行.尝试使用硒.下面提取任何包含 youtube 的 srchref.我只输入托管 youtube 剪辑的关键 iframe.您可以循环所有 iframes 检查.

Most of the youtube links are within an iframe and javascript also needs to run. Try using selenium. The following extracts any src or href containing youtube. I only enter the key iframe hosting the youtube clip. You could loop all iframes checking.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

def addItems(links, final):
    for link in links:
        ref = link.get_attribute('src') if link.get_attribute('src') is not None else link.get_attribute('href')
        final.append(ref)
    return final

url = "https://www.electronic-festivals.com/event/i-am-hardstyle-germany" 
driver = webdriver.Chrome()
driver.get(url)
driver.switch_to.frame(driver.find_element_by_css_selector('.media-youtube-player'))
final = []

try:
    links = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[href*=youtube] , [src*=youtube]")))
    addItems(links, final)
except:
    pass
finally:
    driver.switch_to.default_content()

links = driver.find_elements_by_css_selector('[href*=youtube] , [src*=youtube]')
addItems(links, final)

for link in set(final):
    print(link)

driver.quit()

这篇关于从网页中抓取 YouTube 链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆