有什么方法可以告诉 selenium 在某些时候不执行 js? [英] Any way to tell selenium don't execute js at some point?
问题描述
我想抓取一个由 js 生成的内容的网站.该站点每 5 秒运行一次 js 更新内容(请求新的加密 js 文件,无法解析).
I want to crawl a site which have some generated content by js. That site run a js update content every 5 second (request a new encripted js file, can't parse).
我的代码:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get(url)
trs = driver.find_elements_by_css_selector('.table tbody tr')
print len(trs)
for tr in trs:
try:
items.append(tr.text)
except:
# because the js update content, so this tr is missing
pass
print len(items)
len(items)
与 len(trs)
不匹配.如何告诉 selenium 在我运行 trs = driver.find_elements_by_css_selector('.table tbody tr')
后停止执行 js 或停止工作?
len(items)
would not match len(trs)
.
How to tell selenium stop executing js or stop working after I run trs = driver.find_elements_by_css_selector('.table tbody tr')
?
我稍后需要使用trs
,所以不能driver.quit()
I need use trs
later, so can not driver.quit()
异常详情:
---------------------------------------------------------------------------
StaleElementReferenceException Traceback (most recent call last)
<ipython-input-84-b80e3579efca> in <module>()
11 items = []
12 for tr in trs:
---> 13 items.append(tr.text)
14 #items.append(map_label(hidemyass_label, tr.find_elements_by_tag_name('td')))
15
C:\Python27\lib\site-packages\selenium\webdriver\remote\webelement.pyc in text(self)
69 def text(self):
70 """The text of the element."""
---> 71 return self._execute(Command.GET_ELEMENT_TEXT)['value']
72
73 def click(self):
C:\Python27\lib\site-packages\selenium\webdriver\remote\webelement.pyc in _execute(self, command, params)
452 params = {}
453 params['id'] = self._id
--> 454 return self._parent.execute(command, params)
455
456 def find_element(self, by=By.ID, value=None):
C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.pyc in execute(self, driver_command, params)
199 response = self.command_executor.execute(driver_command, params)
200 if response:
--> 201 self.error_handler.check_response(response)
202 response['value'] = self._unwrap_value(
203 response.get('value', None))
C:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.pyc in check_response(self, response)
179 elif exception_class == UnexpectedAlertPresentException and 'alert' in value:
180 raise exception_class(message, screen, stacktrace, value['alert'].get('text'))
--> 181 raise exception_class(message, screen, stacktrace)
182
183 def _value_or_default(self, obj, key, default):
StaleElementReferenceException: Message: {"errorMessage":"Element is no longer attached to the DOM","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:63305","User-Agent":"Python-urllib/2.7"},"httpVersion":"1.1","method":"GET","url":"/text","urlParsed":{"anchor":"","query":"","file":"text","directory":"/","path":"/text","relative":"/text","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/text","queryKey":{},"chunks":["text"]},"urlOriginal":"/session/4bb16340-a3b6-11e5-8ce5-9d0be40203a6/element/%3Awdc%3A1450243990539/text"}}
Screenshot: available via screen
显然 tr 不见了.
PS:我需要使用硒来选择元素.其他库如 lxml
、pyquery
不知道哪个元素是 display:none
与否、.text()
经常在 <script>
中得到注释或其他东西,等等错误.遗憾的是,python 没有完美的 Jquery 克隆.
PS: I need use selenium to select element. Other libs like lxml
, pyquery
don't know which element is display:none
or not, .text()
often get comment or something in <script>
, and so on bugs. It's sad that python do not have a perfect clone of Jquery.
推荐答案
使用scrapy.确定页面已加载后,使用以下命令抓取正文:
Use scrapy. Once you are sure the page has loaded, grab the body using:
response = TextResponse(url=self.driver.current_url, body=self.driver.page_source, encoding='utf-8')
您现在拥有页面的静态副本,以便您可以使用scrapy 的 response.xpath 来提取您需要的任何数据.这个答案作为更多细节.
You now have a static copy of the page so that you can use scrapy's response.xpath to pull whatever data you need. This answer as more detail.
这篇关于有什么方法可以告诉 selenium 在某些时候不执行 js?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!