在观看YouTube视频时,PhantomJS不会模仿浏览器行为 [英] PhantomJS not mimicking browser behavior when looking at YouTube videos

查看:86
本文介绍了在观看YouTube视频时,PhantomJS不会模仿浏览器行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一周前将这个问题发布到PhantomJS邮件列表中,但没有得到回应。希望能在这里获得更好的运气...

I posted this question to the PhantomJS mailing list a week ago, but have gotten no response. Hoping for better luck here...

我一直在尝试使用PhantomJS从YouTube上获取信息,但一直无法使其正常运行。

I've been trying to use PhantomJS to scrape information from YouTube, but haven't been able to get it working.

考虑通过iframe元素嵌入到网页中的YouTube视频。如果将src属性引用的URL直接加载到浏览器中,则会获得视频的整页版本,其中视频封装在embed元素中。初始页面内容中不存在嵌入元素;相反,页面上的一些脚本标记导致一些Javascript被评估,最终将嵌入元素添加到DOM。我想能够在它出现时访问这个embed元素,但是当我在PhantomJS中加载页面时它永远不会出现。

Consider a YouTube video embedded into a web page via an iframe element. If you load the URL referenced by the src attribute directly into a browser, you get a full-page version of the video, where the video is encapsulated in an embed element. The embed element is not present in the initial page content; rather, some script tags on the page cause some Javascript to be evaluated which eventually adds the embed element to the DOM. I want to be able to access this embed element when it appears, but it never appears when I load the page in PhantomJS.

这是我正在使用的代码:

Here's the code I'm using:

var page = require("webpage").create();

page.settings.userAgent = "Mozilla/5.0 (X11; rv:24.0) Gecko/20130909 Firefox/24.0";

page.open("https://www.youtube.com/embed/dQw4w9WgXcQ", function (status) {
  if (status !== "success") {
    console.log("Failed to load page");
    phantom.exit();
  } else {
    setTimeout(function () {
      var size = page.evaluate(function () {
        return document.getElementsByTagName("EMBED").length;
      });
      console.log(size);
      phantom.exit();
    }, 15000);
  }
});

无论我设置超时多长时间,我都只会在控制台上看到0。如果我查找DIV元素,我得到3,如果我查找SCRIPT元素,我得到5,所以代码似乎是合理的。我只是从来没有找到任何EMBED标签,即使我在浏览器中加载上面的URL我会在页面加载后很快找到一个。

I only ever see "0" printed to the console, no matter how long I set the timeout. If I look for "DIV" elements I get "3", and if I look for "SCRIPT" elements I get "5", so the code seems to be sound. I just never find any "EMBED" tags, even though if I load the URL above in my browser I do find one soon after page-load.

有没有人有任何标签想法可能是什么问题?提前感谢您的帮助。

Does anyone have any idea what the problem might be? Thanks in advance for any help.

推荐答案

帕特里克的回答让我走上正轨,但完整的故事如下。

Patrick's answer got me on the right track, but the full story is as follows.

Youtube的Javascript在决定是否创建某种视频元素之前探测浏览器的功能。在浏览缩小的代码之后,我终于能够通过在页面的 onInitialized中包装 document.createElement 来欺骗Youtube思考PhantomJS支持的HTML5视频。 code>回调。

Youtube's Javascript probes the browser's capabilities before deciding whether to create some kind of video element. After trawling through the minified code, I was eventually able to fool Youtube into thinking PhantomJS supported HTML5 video by wrapping document.createElement in the page's onInitialized callback.

page.onInitialized = function () {
  page.evaluate(function () {
    var create = document.createElement;
    document.createElement = function (tag) {
      var elem = create.call(document, tag);
      if (tag === "video") {
        elem.canPlayType = function () { return "probably" };
      }
      return elem;
    };
  });
};

然而,这是一个失误;得到< embed>标签我原来是以后,我需要让Youtube的代码认为PhantomJS支持Flash,而不是HTML5视频。这也是可行的:

However, this was a misstep; to get the <embed> tag I was originally after, I needed to make Youtube's code think PhantomJS supports Flash, not HTML5 video. That's also doable:

page.onInitialized = function () {
  page.evaluate(function () {
    window.navigator = {
      plugins: { "Shockwave Flash": { description: "Shockwave Flash 11.2 e202" } },
      mimeTypes: { "application/x-shockwave-flash": { enabledPlugin: true } }
    };
  });
};

这就是它的完成方式。

这篇关于在观看YouTube视频时,PhantomJS不会模仿浏览器行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆