从网页中抓取 YouTube 链接 [英] Scraping YouTube links from a webpage
问题描述
我一直试图从网页上抓取 YouTube 链接,但没有任何效果.这是我一直在尝试抓取的内容的图片.:
I've been trying to scrape YouTube links from a webpage, but nothing has worked.
This is a picture of what I've been trying to scrape.:
这是我最近尝试的代码:
This is the code I tried most recently:
youtube_link = soup.find("a", class_="ytp-title-link yt-uix-sessionlink")
这是 YouTube 链接所在网站的链接:https://www.electronic-festivals.com/event/i-am-hardstyle-germany
And this is the link to the website the YouTube link is in: https://www.electronic-festivals.com/event/i-am-hardstyle-germany
推荐答案
大多数 youtube 链接都在 iframe
中,并且 javascript 也需要运行.尝试使用硒.下面提取任何包含 youtub
e 的 src
或 href
.我只输入托管 youtube 剪辑的关键 iframe.您可以循环所有 iframes
检查.
Most of the youtube links are within an iframe
and javascript also needs to run. Try using selenium. The following extracts any src
or href
containing youtub
e. I only enter the key iframe hosting the youtube clip. You could loop all iframes
checking.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
def addItems(links, final):
for link in links:
ref = link.get_attribute('src') if link.get_attribute('src') is not None else link.get_attribute('href')
final.append(ref)
return final
url = "https://www.electronic-festivals.com/event/i-am-hardstyle-germany"
driver = webdriver.Chrome()
driver.get(url)
driver.switch_to.frame(driver.find_element_by_css_selector('.media-youtube-player'))
final = []
try:
links = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[href*=youtube] , [src*=youtube]")))
addItems(links, final)
except:
pass
finally:
driver.switch_to.default_content()
links = driver.find_elements_by_css_selector('[href*=youtube] , [src*=youtube]')
addItems(links, final)
for link in set(final):
print(link)
driver.quit()
这篇关于从网页中抓取 YouTube 链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!