缓存中未找到元素 - 页面可能自从在 Selenium Ruby Web 驱动程序中查找后发生了变化? [英] Element not found in the cache - perhaps the page has changed since it was looked up in Selenium Ruby web driver?

查看:27
本文介绍了缓存中未找到元素 - 页面可能自从在 Selenium Ruby Web 驱动程序中查找后发生了变化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个爬虫,它从加载的页面中爬取所有链接,并将所有请求和响应标头以及响应正文记录在某个文件中,比如 XML 或 txt.我正在新浏览器窗口中打开第一个加载页面的所有链接,所以我不会收到此错误:

I am trying to write a crawler that crawls all links from loaded page and logs all request and response headers along with response body in some file say XML or txt. I am opening all links from first loaded page in new browser window so I wont get this error:

Element not found in the cache - perhaps the page has changed since it was looked up

我想知道从所有链接发出请求和接收响应,然后从所有打开的窗口中定位输入元素和提交按钮的替代方法是什么.我可以在一定程度上做到以上几点,除非打开的窗口在这个 http://www.testfire 上有一个常见的站点搜索框.net 在右上角.我想要做的是我想省略这些常见的框,以便我可以使用 webdriver 的 i.send_keys "value" 方法用值填充其他输入,并且不会出现此错误错误:在缓存中未找到元素 - 页面可能在查找后已更改.

I want to know what could be the alternate way to make requests and receive response from all links and then locate input elements and submit buttons form all opened windows. I am able to do above to some extent except when opened window has common site searh box like one on this http://www.testfire.net in the upper right corner. What I want to do is I want to omit such common boxes so that I can fill other inputs with values using i.send_keys "value" method of webdriver and dont get this error ERROR: Element not found in the cache - perhaps the page has changed since it was looked up.

从每个打开的窗口中检测和区分输入标签的方法是什么,以便在网站大多数页面上出现的常见输入标签中不会重复填充值.我的代码如下:

What is the way to detect and distinguish input tags from each opened window so that value does not get filled repeatably in common input tags that appear on most pages of website. My code is following:

require 'rubygems'
require 'selenium-webdriver'
require 'timeout'

class Clicker
def open_new_window(url)
  @driver = Selenium::WebDriver.for :firefox
  @url = @driver.get " http://test.acunetix.com "
  @link = Array.new(@driver.find_elements(:tag_name, "a"))
  @windows = Array.new(@driver.window_handles())
  @link.each do |a|
      a = @driver.execute_script("var d=document,a=d.createElement('a');a.target='_blank';a.href=arguments[0];a.innerHTML='.';d.body.appendChild(a);return a", a)
      a.click
    end
    i = @driver.window_handles
    i[0..i.length].each do |handle|
        @driver.switch_to().window(handle)
        puts @driver.current_url()
        inputs = Array.new(@driver.find_elements(:tag_name, 'input'))
        forms = Array.new(@driver.find_elements(:tag_name, 'form'))
        inputs.each do |i|
            begin
                i.send_keys "value"
                puts i.class
                i.submit
                rescue Timeout::Error => exc
                    puts "ERROR: #{exc.message}"
                rescue Errno::ETIMEDOUT => exc
                    puts "ERROR: #{exc.message}"
                rescue Exception => exc
                    puts "ERROR: #{exc.message}"
            end
        end 
        forms.each do |j|
            begin
                j.send_keys "value"
                j.submit
                rescue Timeout::Error => exc
                    puts "ERROR: #{exc.message}"
                rescue Errno::ETIMEDOUT => exc
                    puts "ERROR: #{exc.message}"
                rescue Exception => exc
                    puts "ERROR: #{exc.message}"
            end
        end

    end
#Switch back to the original window
    @driver.switch_to().window(i[0])
end
end
ol = Clicker.new
url = ""
ol.open_new_window(url)

指导我如何使用 Selenium Webdriver 或使用 ruby​​ 的 net/httphttp.set_debug_output 获取带有响应正文的所有 requeat 和响应标头?

Guide me how can I get all requeat and response headers with response body using Selenium Webdriver or using http.set_debug_output of ruby's net/http ?

推荐答案

Selenium 不是尝试构建网络爬虫"的最佳选择之一.有时它可能过于古怪,尤其是在遇到意外情况时.Selenium WebDriver 是一个很好的工具,用于自动化和测试预期和用户交互.相反,好的老式 curl 可能是网络爬行的更好选择.另外,我很确定有一些 ruby​​ gem 可以帮助您进行网络抓取,只需 Google 搜索即可!

Selenium is not one of the best options to use to attempt to build a "web-crawler". It can be too flakey at times, especially when it comes across unexpected scenarios. Selenium WebDriver is a great tool for automating and testing expectancies and user interactions. Instead, good old fashioned curl would probably be a better option for web-crawling. Also, I am pretty sure there are some ruby gems that might help you web-crawl, just Google search it!

但是要回答实际问题,如果您要使用 Selenium WebDriver:

But To answer the actual question if you were to use Selenium WebDriver:

我会制定一个过滤算法,您可以将与之交互的元素的 HTML 添加到变量数组中.然后,当您进入下一个窗口/选项卡/链接时,它会检查变量数组并在找到匹配的 HTML 值时跳过该元素.

I'd work out a filtering algorithm where you can add the HTML of an element that you interact with to an variable array. Then, when you go on to the next window/tab/link, it would check against the variable array and skip the element if it finds a matching HTML value.

遗憾的是,SWD 不支持使用其 API 获取请求标头和响应.常见的解决方法是使用第三方代理来拦截请求.

Unfortunately, SWD does not support getting request headers and responses with its API. The common work-around is to use a third party proxy to intercept the requests.

============

============

现在我想解决一些与您的代码有关的问题.

Now I'd like to address a few issues with your code.

我建议在遍历链接之前,添加一个 @default_current_window = @driver.window_handle.这将允许您在调用 @driver.switch_to.window(@default_current_window) 时始终返回到脚本末尾的正确窗口.

I'd suggest before iterating over the links, add a @default_current_window = @driver.window_handle. This will allow you to always return back to the correct window at the end of your script when you call @driver.switch_to.window(@default_current_window).

在你的@links 迭代器中,不要迭代所有可能显示的窗口,而是使用 @driver.switch_to.window(@driver.window_handles.last).这将切换到最近显示的新窗口(并且每次链接点击只需要发生一次!).

In your @links iterator, instead of iterating over all the possible windows that could be displayed, use @driver.switch_to.window(@driver.window_handles.last). This will switch to the most recently displayed new window (and it only needs to happen once per link click!).

您可以通过执行以下操作来干掉您的输入和表单代码:

You can DRY up your inputs and form code by doing something like this:

inputs = []
inputs << @driver.find_elements(:tag_name => "input")
inputs << @driver.find_elements(:tag_name => "form")
inputs.flatten
inputs.each do |i|
  begin
    i.send_keys "value"
    i.submit
  rescue e
    puts "ERROR: #{e.message}"
  end
end

请注意我是如何将您希望 SWD 查找的所有元素添加到您迭代的单个数组变量中的.然后,当发生不好的事情时,需要一次救援(我假设您不想从那里自动退出,这就是为什么您只想将消息打印到屏幕上).

Please note how I just added all of the elements you wanted SWD to find into a single array variable that you iterate over. Then, when something bad happens, a single rescue is needed (I assume you don't want to automatically quit from there, which is why you just want to print the message to the screen).

学习干掉你的代码并使用外部 gem 将帮助你以更快的速度实现很多你想要做的事情.

Learning to DRY up your code and use external gems will help you achieve a lot of what you are trying to do, and at a faster pace.

这篇关于缓存中未找到元素 - 页面可能自从在 Selenium Ruby Web 驱动程序中查找后发生了变化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆