PyQt 类不适用于第二次使用 [英] PyQt Class not working for the second usage

查看:21
本文介绍了PyQt 类不适用于第二次使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 PyQt 完全加载页面(包括 JS)并使用 Beautiful Soup 获取其内容.在第一次迭代时工作正常,但之后,它崩溃了.我对 Python 的了解不多,对 PyQt 的了解更少,因此非常欢迎任何帮助.

I'm using PyQt to fully load a page(including JS) and get it contents using Beautiful Soup. Works fine at the first iteration, but after, it crashes. I don't have a big knowledge in Python, and even less in PyQt, so any help is very welcome.

这里借来的课程.>

Class borrowed from here.

from PyQt4.QtCore import QUrl, SIGNAL
from PyQt4.QtGui import QApplication
from PyQt4.QtWebKit import QWebPage

from bs4 import BeautifulSoup
from bs4.dammit import UnicodeDammit
import sys
import signal


class Render(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.html = None
        signal.signal(signal.SIGINT, signal.SIG_DFL)
        self.connect(self, SIGNAL('loadFinished(bool)'), self._finished_loading)
        self.mainFrame().load(QUrl(url))
        self.app.exec_()

    def _finished_loading(self, result):
        self.html = self.mainFrame().toHtml()
        self.soup = BeautifulSoup(UnicodeDammit(self.html).unicode_markup)
        self.app.quit() 

###################################################################


l = ["http://www.google.com/?q=a", "http://www.google.com/?q=b", "http://www.google.com/?q=c"]

for page in l:
    soup = Render(page).soup
    print("# soup done: " + page)

推荐答案

该示例崩溃,因为 RenderPage 类尝试为每个创建一个新的 QApplication 和事件循环它尝试加载的网址.

The example crashes because the RenderPage class attempts to create a new QApplication and event-loop for every url it tries to load.

相反,应该只创建一个 QApplication,并且 QWebPage 子类应该在处理完每个页面后加载一个新的 url,而不是使用 for 循环.

Instead, only one QApplication should be created, and the QWebPage subclass should load a new url after each page has been processed, rather than using a for-loop.

这是一个重写的例子,它应该做你想做的:

Here's a re-write of the example which should do what you want:

import sys, signal
from bs4 import BeautifulSoup
from bs4.dammit import UnicodeDammit
from PyQt4 import QtCore, QtGui, QtWebKit

class WebPage(QtWebKit.QWebPage):
    def __init__(self):
        QtWebKit.QWebPage.__init__(self)
        self.mainFrame().loadFinished.connect(self.handleLoadFinished)

    def process(self, items):
        self._items = iter(items)
        self.fetchNext()

    def fetchNext(self):
        try:
            self._url, self._func = next(self._items)
            self.mainFrame().load(QtCore.QUrl(self._url))
        except StopIteration:
            return False
        return True

    def handleLoadFinished(self):
        self._func(self._url, self.mainFrame().toHtml())
        if not self.fetchNext():
            print('# processing complete')
            QtGui.qApp.quit()


def funcA(url, html):
    print('# processing:', url)
    # soup = BeautifulSoup(UnicodeDammit(html).unicode_markup)
    # do stuff with soup...

def funcB(url, html):
    print('# processing:', url)
    # soup = BeautifulSoup(UnicodeDammit(html).unicode_markup)
    # do stuff with soup...

if __name__ == '__main__':

    items = [
        ('http://stackoverflow.com', funcA),
        ('http://google.com', funcB),
        ]

    signal.signal(signal.SIGINT, signal.SIG_DFL)
    print('Press Ctrl+C to quit
')
    app = QtGui.QApplication(sys.argv)
    webpage = WebPage()
    webpage.process(items)
    sys.exit(app.exec_())

这篇关于PyQt 类不适用于第二次使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆