Python中的JavaScript刮使用Selenium和美味的汤 [英] Python Scraping JavaScript using Selenium and Beautiful Soup
问题描述
我想凑一个JavaScript支持使用BS和硒页面。
我有以下的code为止。它仍然不以某种方式检测到的JavaScript(并返回一个空值)。在这种情况下,我想凑在底部Facebook的意见。 (检查元素显示类postText)结果
感谢您的帮助!
硒进口的webdriver
从selenium.common.exceptions导入NoSuchElementException异常
从selenium.webdriver.common.keys导入密钥
进口BeautifulSoup浏览器= webdriver.Firefox()
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')
html_source = browser.page_source
browser.quit()汤= BeautifulSoup.BeautifulSoup(html_source)
评论=汤(格,{级:postText})
打印意见
有在code一些错误的,下面固定的。然而,类postText必须存在,在其他地方,因为它不是原始出处code定义。
我的修订你的code版本进行了测试,并正在努力在多个网站上。
硒进口的webdriver
从selenium.common.exceptions导入NoSuchElementException异常
从selenium.webdriver.common.keys导入密钥
从BS4进口BeautifulSoup浏览器= webdriver.Firefox()
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')
html_source = browser.page_source
browser.quit()汤= BeautifulSoup(html_source,'html.parser')
#classpostText不是在源$ C $ C定义
评论= soup.findAll('格',{'类':'postText'})
打印意见
I'm trying to scrape a JavaScript enables page using BS and Selenium.
I have the following code so far. It still doesn't somehow detect the JavaScript (and returns a null value). In this case I'm trying to scrape the Facebook comments in the bottom. (Inspect element shows the class as postText)
Thanks for the help!
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import BeautifulSoup
browser = webdriver.Firefox()
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')
html_source = browser.page_source
browser.quit()
soup = BeautifulSoup.BeautifulSoup(html_source)
comments = soup("div", {"class":"postText"})
print comments
There are some mistakes in your code that are fixed below. However, the class "postText" must exist elsewhere, since it is not defined in the original source code. My revised version of your code was tested and is working on multiple websites.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
browser = webdriver.Firefox()
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')
html_source = browser.page_source
browser.quit()
soup = BeautifulSoup(html_source,'html.parser')
#class "postText" is not defined in the source code
comments = soup.findAll('div',{'class':'postText'})
print comments
这篇关于Python中的JavaScript刮使用Selenium和美味的汤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!