多个页面的 PhantomJS 意外加载行为 [英] PhantomJS unexpected load behavior with multiple pages

查看:27
本文介绍了多个页面的 PhantomJS 意外加载行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本(如下),它通过 3 个步骤来抓取网站.一次最多设置为 1 页时效果很好.但是,当我一次将其增加到 2 时,事情开始变得不稳定.onFinished 比我预期的更早触发,并且页面尚未完全加载.因此,我的脚本的其余部分中断了.知道为什么会发生这种情况吗?我应该补充一点,我使用的是最新版本 (1.5).

i have a script (below) that scrapes a site with a 3 step process. it works great when set to a maximum of 1 page at a time. however, when i increase that to 2 at a time things start getting wonky. the onFinished fires earlier than i would expect and the page isn't completely loaded yet. because of this the rest of my script breaks. any idea why this might be happening? i should add that i'm using the newest version (1.5).

MAX_PAGES = 1
### 
changing MAX_PAGES to >1 causes some pages onFinished event to fire before
the page is fully rendered.  this is evident by the fact that there are >1 images
for some pages.  i havent been able to reproduce using microsoft.com, but on some
pages i was working on the first onLoadFinished seemed to be called before the page
was actually fully loaded based on the look of the rendered images
###

newPage = (id) ->
context = {}
context.id = id
context.step = 0
context.page = require('webpage').create()
context.page.onLoadStarted = ->
    context.step++
context.page.onLoadFinished = (status) ->
    console.log status
    if status is 'success'
        context.page.render("#{context.id}_#{context.step}.png")
    else
        context.page.release()
        context.page.open('http://www.microsoft.com')
        console.log 'started loading'

newPage id for id in [1..MAX_PAGES]

推荐答案

我认为问题在于 PhantomJS 中的每个网页都使用相同的 QNetworkAccessManager,因此 finished() 信号在每个网页对象完成加载时触发.为了解决这个问题,可能需要对 PhantomJS 的代码进行修改.在尝试在 PhantomJS 中并行加载多个页面时,我已经注意到了这一点.我正在开发的应用程序使用 QtWebkit 并同时加载多个页面,因此我必须确保每个网页都有自己的 QNetworkAccessManager,以便finished() 信号不会相互干扰.

I think the problem has to do with the fact that each webpage within PhantomJS is using the same QNetworkAccessManager, thus, the finished() signal is firing when each webpage object finishes loading. Modifications to PhantomJS's code might need to be made in order to fix this problem. I have noticed this before when trying to load multiple pages in parallel in PhantomJS. An application I'm working on uses QtWebkit and loads multiple pages simultaneously so I have to make sure that each webpage gets its own QNetworkAccessManager so that the finished() signals don't interfere with each other.

这篇关于多个页面的 PhantomJS 意外加载行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆