使用selenium获取当前视频标记网址 [英] Getting current video tag URL with selenium
问题描述
我正在尝试使用selenium(使用python绑定)获取当前的html5视频标记网址:
I'm trying to get the current html5 video tag URL using selenium (with python bindings):
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.youtube.com/watch?v=9x6YclsLHN0')
video = driver.find_element_by_tag_name('video')
url = driver.execute_script("return arguments[0].currentSrc;", video)
print url
driver.quit()
问题在于 url
值打印为空。为什么这样,我该如何解决?
The problem is that the url
value is printed empty. Why is that and how can I fix it?
我怀疑这是因为脚本被执行而且 currentSrc
在视频标记初始化之前返回值。我试图添加显式等待,但仍然打印出一个空字符串:
I suspect that this is because the script is executed and the currentSrc
value is returned before the video tag has been initialized. I've tried to add an Explicit Wait, but still got an empty string printed:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 5)
video = wait.until(EC.visibility_of_element_located((By.TAG_NAME, 'video')))
这让我觉得我需要按照异步 进行操作。可能正在收听媒体活动并等待视频
开始播放。
Which makes me feel I need to do it asynchronously. May be listening for the media events and wait for the video
to start playing.
我也很确定 currentSrc
应该可以工作,因为如果我在控制台中执行代码并手动等待视频启动 - 我看到它打印视频 currentSrc
属性值。
I'm also pretty sure currentSrc
should work, because if I execute the code in the console and manually wait for a video to start - I see it printing the video currentSrc
attribute value.
FYI,也尝试使用java绑定,结果相同,为空字符串:
FYI, also tried with java bindings, same result, an empty string:
WebDriver driver = new ChromeDriver();
driver.get("https://www.youtube.com/watch?v=9x6YclsLHN0");
WebElement video = driver.findElement(By.tagName("video"));
JavascriptExecutor js = (JavascriptExecutor) driver;
String url = (String) js.executeScript("return arguments[0].currentSrc;", video);
System.out.println(url);
推荐答案
根据 W3视频标签规范:
currentSrc DOM属性最初是空字符串。它的值
由资源选择算法改变。
The currentSrc DOM attribute is initially the empty string. Its value is changed by the resource selection algorithm.
这解释了问题中描述的行为。这也意味着要可靠地获得 currentSrc
值,我们需要等到媒体资源定义。
Which explains the behavior described in the question. This also means that to get the currentSrc
value reliably, we need to wait until the media resource has it defined.
订阅 loadstart
媒体事件到 execute_async_script()
诀窍:
Subscribing to the loadstart
media event through execute_async_script()
did the trick:
driver.set_script_timeout(10)
url = driver.execute_async_script("""
var video = arguments[0],
callback = arguments[arguments.length - 1];
video.addEventListener('loadstart', listener);
function listener() {
callback(video.currentSrc);
};
""", video)
print(url)
这篇关于使用selenium获取当前视频标记网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!