调用“onclick”的问题事件使用PyQt& javascript [英] Issue in invoking "onclick" event using PyQt & javascript

查看:218
本文介绍了调用“onclick”的问题事件使用PyQt& javascript的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用美丽的汤来刮取网站上的数据。默认情况下,此网页显示18项,点击javascript按钮showAlldevices后,所有41个项目都可见。美丽的汤只抓取默认情况下可见的项目的数据,以获取我使用的所有项目的数据PyQt模块并使用JavaScript代码调用点击事件。下面是引用代码:

  import csv 
import urllib2
import sys
import time
从bs4 import BeautifulSoup
从PyQt4.QtGui导入*
从PyQt4.QtCore导入*
从PyQt4.QtWebKit导入*

类Render(QWebPage ):
def __init __(self,url):
self.app = QApplication(sys.argv)
QWebPage .__ init __(self)
self.loadFinished.connect _loadFinished)
self.mainFrame()。load(QUrl(url))
self.app.exec_()

def _loadFinished(self,result):
self.frame = self.mainFrame()
self.app.quit()

url ='http://www.att.com/shop/wireless/devices/smartphones.html '
r = Render(url)
jsClick =var evObj = document.createEvent('MouseEvents');
evObj.initEvent('click',true,true);
this.dispatchEvent(evObj);


allSelector =a#deviceShowAllLink
allButton = r.frame.documentElement()。findFirst(allSelector)
allButton.evaluateJavaScript(jsClick)
html = allButton.webFrame()。toHtml()


page = html
soup = BeautifulSoup b $ b soup.prettify()
with open('Smartphones_26decv2.0.csv','wb')as csvfile:
spamwriter = csv.writer(csvfile,delimiter =',')
spamwriter.writerow([Date,Day of Week,Device Name,Price])
items = soup.findAll('a',{class:clickStreamSingleItem} ,text = True)
价格= soup.findAll('div',{class:listGrid-price})
项目,价格zip(项目,价格):
textcontent = u''.join(price.stripped_strings)
if textcontent:
spamwriter.writerow([time.strftime(%Y-%m-%d),time.strftime %a),unicode(item.string).encode('utf8')。strip(),textcontent])

我使用这行代码将html喂给美丽的汤 html = allButton.webFrame()。toHtml()这段代码运行没有任何错误但我仍然无法获取输出csv 中所有41项的数据



我还尝试使用这些代码行将html喂给美丽的汤:

  allButton = r.frame.documentElement()。findFirst(allSelector)
a = allButton.evaluateJavaScript(jsClick)
html = a.webFrame.toHtml()


page = html
soup = BeautifulSoup(page)

但我遇到了这个错误: html = a.webFrame.toHtml()
AttributeError:'QVariant'没有属性'webFrame'



请原谅我的无知,如果我要求任何根本的,因为我是新的编程,这个问题。

解决方案

我认为你的JavaScript代码有问题。因为你创建一个 MouseEvent 对象,你应该使用 initMouseEvent 方法进行初始化。您可以在此处找到示例。



UPDATE2



但我认为最简单的想法是使用元素的JavaScript DOM方法 onclick ,而不是使用您自己的JavaScript代码。像这样:

  allButton.evaluateJavaScript(this.onclick())

应该可以正常工作。



UPDATE 3



您可以通过 r.action(QWebPage.ReloadAndBypassCache) r.action(QWebpage.Reload) 但它似乎没有任何效果。我试图显示与 QWebView 的页面,点击链接,看看会发生什么。不幸的是,我得到了大量的分段错误错误,所以我会发现有一个bug在PyQt4 / Qt4。因为被报废的页面使用jquery我也试图在 QWebPage 中加载jquery后显示它,但再次没有运气(segfaults不消失)。我放弃:(我希望这里的其他用户将帮助你。无论如何,我建议你要求帮助 PyQt4邮件列表,它们为PyQt用户提供了极好的支持。



UPDATE
$ b

更改代码时出现的错误:记住 allButton 是一个 QWebElement object。 QWebElement.evaluateJavaScript 方法返回一个 QVariant 对象(如文档),那种对象没有 webFrame 属性,因为您可以检查此第页。


I am trying to scrape data from a website using beautiful soup. By default, this webpage shows 18 items and after clicking on a javascript button "showAlldevices" all 41 items are visible. Beautiful soup scrapes data only for items visible by default, to get data for all items I used PyQt module and invoked the click event using the javascript code. Below is the referred code:

import csv
import urllib2
import sys
import time
from bs4 import BeautifulSoup
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *  

class Render(QWebPage):  
  def __init__(self, url):  
    self.app = QApplication(sys.argv)  
    QWebPage.__init__(self)  
    self.loadFinished.connect(self._loadFinished)  
    self.mainFrame().load(QUrl(url))  
    self.app.exec_()  

  def _loadFinished(self, result):  
    self.frame = self.mainFrame()  
    self.app.quit()  

url = 'http://www.att.com/shop/wireless/devices/smartphones.html'  
r = Render(url)
jsClick = """var evObj = document.createEvent('MouseEvents');
             evObj.initEvent('click', true, true );
             this.dispatchEvent(evObj);
             """

allSelector = "a#deviceShowAllLink" 
allButton   = r.frame.documentElement().findFirst(allSelector)
allButton.evaluateJavaScript(jsClick) 
html = allButton.webFrame().toHtml()


page = html
soup = BeautifulSoup(page)
soup.prettify()
with open('Smartphones_26decv2.0.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')
    spamwriter.writerow(["Date","Day of Week","Device Name","Price"])
    items = soup.findAll('a', {"class": "clickStreamSingleItem"},text=True)
    prices = soup.findAll('div', {"class": "listGrid-price"})
    for item, price in zip(items, prices):
        textcontent = u' '.join(price.stripped_strings)
        if textcontent:            
            spamwriter.writerow([time.strftime("%Y-%m-%d"),time.strftime("%A") ,unicode(item.string).encode('utf8').strip(),textcontent])

I am feeding the html to beautiful soup by using this line of code html = allButton.webFrame().toHtml() This code is running without any errors but I am still not getting data for all 41 items in the output csv

I also tried feeding html to beautiful soup using these lines of code:

allButton   = r.frame.documentElement().findFirst(allSelector)
a = allButton.evaluateJavaScript(jsClick) 
html = a.webFrame.toHtml()


page = html
soup = BeautifulSoup(page)

But I came across this error: html = a.webFrame.toHtml() AttributeError: 'QVariant' object has no attribute 'webFrame'

Please pardon my ignorance if I am asking anything fundamental here, as I am new to programming and help me in solving this issue.

解决方案

I think there is a problem with your JavaScript code. Since you're creating a MouseEvent object you should use an initMouseEvent method for initialization. You can find an example here.

UPDATE2

But I think the simplest think you can try is to use the JavaScript DOM method onclick of the a element instead of using your own JavaScript code. Something like this:

allButton.evaluateJavaScript("this.onclick()")

should work. I suppose you will have to reload the page after clicking, before passing it to the parser.

UPDATE 3

You can reload the page via r.action(QWebPage.ReloadAndBypassCache) or r.action(QWebpage.Reload) but it doesn't seem to have any effect. I've tried to display the page with QWebView, click the link and see what happens. Unfortunately I'm getting lots of Segmentation Fault errors so I would swear there is a bug somewhere in PyQt4/Qt4. As the page being scrapped uses jquery I've also tried to display it after loading jquery in the QWebPage but again no luck (the segfaults do not disappear). I'm giving up :( I hope other users here at SO will help you. Anyway I recommend you to ask for help to the PyQt4 mailing list. They provide excellent support to PyQt users.

UPDATE

The error you get when changing your code is expected: remember that allButton is a QWebElement object. And the QWebElement.evaluateJavaScript method returns a QVariant object (as stated in the docs) and that kind of objects don't have a webFrame attribute as you can check reviewing this page.

这篇关于调用“onclick”的问题事件使用PyQt& javascript的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆