Python中的JavaScript刮使用Selenium和美味的汤 [英] Python Scraping JavaScript using Selenium and Beautiful Soup

查看:183
本文介绍了Python中的JavaScript刮使用Selenium和美味的汤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想凑一个JavaScript支持使用BS和硒页面。
我有以下的code为止。它仍然不以某种方式检测到的JavaScript(并返回一个空值)。在这种情况下,我想凑在底部Facebook的意见。 (检查元素显示类postText)结果
感谢您的帮助!

 硒进口的webdriver
从selenium.common.exceptions导入NoSuchElementException异常
从selenium.webdriver.common.keys导入密钥
进口BeautifulSoup浏览器= webdriver.Firefox()
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')
html_source = browser.page_source
browser.quit()汤= BeautifulSoup.BeautifulSoup(html_source)
评论=汤(格,{级:postText})
打印意见


解决方案

有在code一些错误的,下面固定的。然而,类postText必须存在,在其他地方,因为它不是原始出处code定义。
我的修订你的code版本进行了测试,并正在努力在多个网站上。

 硒进口的webdriver
从selenium.common.exceptions导入NoSuchElementException异常
从selenium.webdriver.common.keys导入密钥
从BS4进口BeautifulSoup浏览器= webdriver.Firefox()
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')
html_source = browser.page_source
browser.quit()汤= BeautifulSoup(html_source,'html.parser')
#classpostText不是在源$ C ​​$ C定义
评论= soup.findAll('格',{'类':'postText'})
打印意见

I'm trying to scrape a JavaScript enables page using BS and Selenium. I have the following code so far. It still doesn't somehow detect the JavaScript (and returns a null value). In this case I'm trying to scrape the Facebook comments in the bottom. (Inspect element shows the class as postText)
Thanks for the help!

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup.BeautifulSoup(html_source)  
comments = soup("div", {"class":"postText"})  
print comments

解决方案

There are some mistakes in your code that are fixed below. However, the class "postText" must exist elsewhere, since it is not defined in the original source code. My revised version of your code was tested and is working on multiple websites.

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
from bs4 import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup(html_source,'html.parser')  
#class "postText" is not defined in the source code
comments = soup.findAll('div',{'class':'postText'})  
print comments

这篇关于Python中的JavaScript刮使用Selenium和美味的汤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆