selenium 浏览器自动化中的执行流程 [英] Execution flow in selenium browser automation

查看:41
本文介绍了selenium 浏览器自动化中的执行流程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不确定 selenium 中的脚本(自动测试)执行.我想过程如下:

  • 执行开始.
  • selenese 命令被转换为 HTTP 请求.
  • 浏览器驱动的 HTTP 服务器接收 HTTP 请求.
  • 浏览器驱动程序决定了实现
    命令.
  • 浏览器驱动程序在浏览器上执行它们.
  • 执行状态发送回浏览器的HTTP服务器驱动程序,然后到脚本(IDE).

我想这就是过程.有错的地方请指正.

解决方案

是的,大致就是这样.

理论

粗体 &方框中是代理方,斜体 &箭头表示使用的协议.

当你想与浏览器交互时,

  1. 您的代码使用您使用的语言webdriver客户端(通常是一个库,如selenium)(Java、Python、Ruby 等).
  2. 该客户端与 webdriver 服务器 通信,发送 &按照webdriver协议接收数据;该协议封装在 http 中,以便于传输和使用.控制.
  3. webdriver 服务器 将其转换为浏览器 的实际命令 - 因此它(浏览器)与页面交互,或从中获取数据.

流程始终是端到端的(例如,浏览器从不直接与您的代码通信:)),并且是双向的.失败/异常通常只会出现在您的代码的上游.

<小时>

一些细节

该图中的浏览器的网络驱动程序"是一个二进制(程序) - Firefox 的geckodriver"(在 Windows 上带有.exe")、chromedriver"、safaridriver"、edgedriver.exe"(它总是带有.exe":)).它充当代理 - 一方面接受和理解 webdriver 协议中的命令,另一方面 - 知道如何与浏览器通信.

webdriver 始终是一个 HTTP 服务器 - 所有命令都封装在 HTTP 中,使用常用方法 get/post/delete/put (关闭,如果与常规 REST 不同).它实现了 webdriver 协议,因此客户端(selenium 和 co)具有定义明确的 API与之交流.因此,它也可以称为webdriver 服务器"——它侦听命令,将它们代理到浏览器,并将响应返回给客户端.(没有人这样称呼它:),但它可以更容易地区分webdriver 可执行文件"和webdriver 协议")

作为服务器,它绑定 &侦听随机 网络端口 - 在您的本地机器或远程机器上.如果你在本地运行,这就是它的二进制文件必须在你的 path 变量中的原因 - 在初始化时 Selenium 启动它(所以它必须能够找到它)并获取它正在侦听的网络端口(用于进一步交流).如果您使用远程连接,则必须 a) 知道远程 webdriver 服务器的 IP:端口,或 b) 使用Selenium Hub",它会在其域下跟踪此信息,并与您共享.

<小时>

webdriver 服务器和浏览器之间的通信通常是二进制 rpc,并且非常特定于浏览器 - 它使用内部 API,webdriver 知道如何最好地控制这个特定浏览器的胆量和螺栓.因此,驱动程序由浏览器供应商提供.这始终是本地(在同一台机器/操作系统中) 通信(至少据我所知).

<小时>

如果您使用的是更高级别的框架,例如 Robot Framework、Cucumber、JBehave 等,则它位于该图中的您的代码"之前,试图保护您免受某些 selenium 调用的影响.

<小时>

实践中

一张图片值一千字",那么代码一定是740之类的吧?:) 足够的理论,这是一个实际的例子:

from selenium import webdriver # 导入硒绑定wd = webdriver.Firefox() # 连接本地的webdriver server"element = wd.find_element_by_css_selector('#my-id') # 定位一个元素the_text = element.text # 获取是文本assert(text == 'My Awesome text!') # 验证它是预期的

整个清单是第一部分中的您的代码 - 为完成工作而执行的不同指令、流程控制和检查.在第 1 行,python 的 selenium 库被导入以供进一步使用.

Selenium 是最流行的实现 webdriver 协议的框架;它有不同语言的实现(即绑定) - python 在这里,java, ruby, javascript 等等.它努力做的是为所有这些都提供一个统一的接口——Java 中的 getText() 在 Python 中也可以作为 .text 使用,再次如此——依此类推.通过这个接口,它将客户端与实际的 webdriver 协议隔离开来——用户键入 .text,并且不关心它实际是如何执行的,也不必在协议改变时改变他的代码.><小时>

第 3 行,一个 webdriver 对象被实例化;因为这里是一个本地服务器,所以实例化过程会经历前面描述的本地步骤——运行webdriver 服务器",它的端口现在是已知的(并存储在对象中)并且通信可以开始.

<小时>代码中的

第 4 行 使用 selenium 方法来定位页面中的特定元素.在幕后,库向 webdriver 服务器发送 POST http 请求,以定位元素.
为什么要发布?因为一旦成功找到,服务器就会给它分配一个内部id,以后会用到;并将 id 返回给客户端,客户端将其存储为 element 对象的属性 (* 参见脚注).
webdriver 服务器如何定位该元素?没办法——它通过专有协议与浏览器通信,说嘿,使用你的渲染和评估引擎,在 DOM 中找到一个与这个 CSS 选择器匹配的元素,并给我一个我们俩将来都可以重用的参考."(即魔法":).所以是浏览器在做这项工作,webdriver 服务器只是代理通信.

<小时>

让我们来看看细节 - 第 5 行 执行命令 .text,它显然返回元素的文本 (如果你不知道 python,不要担心为什么它是一个命令,但最后没有 () - 这是一种语言怪癖,将方法别名为对象属性,一个非常方便的功能).
此时会发生什么:selenium python 绑定将此命令与公共接口中的 getElementText 匹配;然后将其与 webdriver 协议命令 (打开链接,很有趣,我保证) - 它是 GET 类型,它的参数是这个和那个.
它打开一个到localhost:the_know_port"的网络连接,到这个端点:

GET/session/2cce72b7-c748-48bc-b350-6dd6730b5a69/element/5/text

第一个随机"字符串是会话 ID - 一个 webdriver 服务器可以被许多客户端使用,你的服务器被建立并存储在第 3 行.第二个参数 (5") 是元素的 id,在第 4 行中建立.然后是文本" - 您请求的子资源,元素支持的资源之一.
这就是臭名昭著的 webdriver 协议/API - 特定访问方案的知识(您可以在会话中获取已建立元素的文本") 和流程 (您必须首先建立一个共享会话,然后是对元素的引用,最后得到文本").

之后,webdriver 服务器让浏览器从其 DOM ("the magic") 获取信息,并将其发送回客户端(selenium 实例)在电线上:

{"sessionId":"2cce72b7-c748-48bc-b350-6dd6730b5a69","status":0,"value":"我的文字真棒!"}

您的 selenium 实例正在等待响应,从有效负载中获取并解析信息,并将值返回给您的代码 - 变量 the_text 现在具有值我很棒的文字!".

<小时>

并且-完成,循环代码->webdriver客户端->网络驱动程序服务器 ->浏览器 ->网络驱动程序服务器 ->webdriver客户端->代码现已完成.

<小时>

脚注:

(*) - 这是可怕的 StaleElementReferenceException 的真正原因:所有三个 - 客户端、webdriver 服务器和浏览器,都持有对 DOM 中元素的引用.
但是在某个特定时刻,第 3 方 - 在浏览器中运行的 javascript 代码,更改/删除元素,幸运地不知道某些东西具有引用,它现在使它无效(想想看,这是一种非常邪恶的行为:D).
下次客户端尝试通过 webdriver 服务器与浏览器中的引用交互时 - 元素不再存在.自然地,交互失败,失败回到上游到客户端并出现异常;它的文本消息是元素不再附加到 DOM" - 现在有点神秘,希望如此.

I'm uncertain about the script(automated test) execution in selenium. I suppose the process is as below:

  • execution starts.
  • A selenese command is transformed into an HTTP request.
  • HTTP server of browser driver receives the HTTP request.
  • Browser driver determines the steps needed for implementing the
    command.
  • Browser driver executes them on the browser.
  • The execution status is sent back to the HTTP server of the browser driver and then to the script(IDE).

I suppose this is the process. Please correct me wherever I'm wrong.

解决方案

Yes, this is it, in broad strokes.

The Theory

In bold & in boxes are the acting parties, in italic & arrows the used protocols.

When you want to interact with a browser,

  1. your code uses a webdriver client (usually a library, like selenium) in the language you use (Java, Python, Ruby, etc).
  2. That client communicates with a webdriver server, sending & receiving the data following the webdriver protocol; this protocol is encapsulated in http for easier transport & control.
  3. The webdriver server translates it to actual commands to the browser - so it (the browser) interacts with the page, or gets data from it.

The flow is always end-to-end (e.g. the browser never communcates directly with your code :)), and bi-directional. A failure/exception usually goes only upstream to your code.


Some Details

The "browser's webdriver" in that graph is a binary (a program) - the "geckodriver" for Firefox (with ".exe" on Windows), "chromedriver", "safaridriver", "edgedriver.exe" (it always is with ".exe" :)). It acts as a proxy - on one side accepting and understanding the commands in the webdriver protocol, on the other - knowing how to communicate with the browser.

The webdriver is always an HTTP server - all commands are encapsulated in HTTP, with the usual methods get/post/delete/put (close, if not the same as a regular REST). It implements the webdriver protocol, so clients (selenium & co) have a well defined API to communicate with it. Thus it can also be referred as the "webdriver server" - it listens for commands, proxies them to the browser, and returns responses to the client. (no one calls it like this :), but it's making it easier to distinguish between "webdriver the executable" and "webdriver the protocol")

Being a server, it binds & listens on a random network port - on your local machine, or on a remote one. If you are running locally, this is the reason its binary must be in your path variable - upon initialization Selenium starts it (so it must be able to find it) and gets the network port it is listening on (for further communication). If you're using a remote connection, you must either a) know the IP:port of the remote webdriver server, or b) use a "Selenium Hub", which tracks this information under its domain, and shares it with you.


The communication b/n the webdriver server and the browser is usually binary rpc, and very much browser-specific - it uses internal APIs, the webdriver knows the guts and bolts how to control this particular browser best. Thus the drivers are provided by the browser vendors. This is always local (in the same machine/OS) communication (at least to my knowledge).


If you are using a higher-level framework like Robot Framework, Cucumber, JBehave etc, it sits before "your code" in that diagram, trying to shield you from some of the selenium calls.


In Practice

"A picture is worth a thousand words", so a code must be something like 740? :) Enough theory, here's a practical example:

from selenium import webdriver       # importing selenium bindings

wd = webdriver.Firefox()      # connect to the "webdriver server", a local one
element = wd.find_element_by_css_selector('#my-id')  # locate an element
the_text = element.text        # get is text

assert(text == 'My awesome text!')   # verify it's the expected one

This whole listing is your code in the first part - the different instructions, flow control and checks that are executed to get the job done. On line 1 the python's selenium library is imported for further usage.

Selenium is the most popular framework implementing the webdriver protocol; it has implementations (i.e.bindings) for different languages - python here, java, ruby, javascript and so on. What it strives to do is to have an uniform interface for all of them - getText() in Java is also available in Python as .text, and again - so on. With this interface it isolates the client from the actual webdriver protocol - the user types .text, and doesn't care how this is actually executed, nor has to change his code if the protocol changes.


On line 3 a webdriver object is instantiated; as this one here is a local server, the instantiation process goes through the local steps described earlier - the "webdriver server" is ran, its port is now known (and stored in the object) and communication can start.


Line 4 in the code uses the selenium method to locate a particular element in the page. Under the hood, the library sends a POST http request to the webdriver server, to locate the element.
Why POST? Because once successfully found, the server assigns an internal id to it, which will be used from then on; and returns the id to the client, which stores it as a property of the element object (* see the footnote).
How does the webdriver server locate that element? No how - it communcates with the browser, through the propriatory protocol, saying "Hey, using your rendering and evaluation engine, find an element in the DOM that matches this CSS selector, and give me a reference we both can reuse in the future." (i.e. " the magic" :). So it is the browser that does the work, the webdriver server just proxies the communication.


Let's get to the specifics - line 5 executes the command .text, that obviously returns the text of the element (if you don't know python, don't be alarmed how come it's a command but doesn't have () at the end - that's a language quirk, aliasing methods as object properties, a quite handy feature).
What happens at this point: selenium python binding matches this command to the getElementText in its common interface; then it matches that to a webdriver protocol command (open the link, it's interesting, I promise) - it's of a GET type, and the parameters for it are this and that.
It opens a network connection to "localhost:the_know_port", to this endpoint:

GET /session/2cce72b7-c748-48bc-b350-6dd6730b5a69/element/5/text

The first "random" string is the session id - a webdriver server can be used by many clients, yours is established and stored at line 3. The second parameter (the "5") is the element's id, established in line 4. Then comes "text" - the subresource you are requesting, one of the element's supported ones.
And this is the infamous webdriver protocol/API - the knowledge of specific access schemes (you can get the "text" of an established element, in a session) and flow (you must first establish a shared session, then a reference to an element, so finally to get "text").

After that the webdriver server makes the browser get the info from its DOM ("the magic"), and sends it back to the client (the selenium instance) on the wire:

{"sessionId":"2cce72b7-c748-48bc-b350-6dd6730b5a69","status":0,"value":"My awesome text!"}

Your selenium instance was waiting for the response, gets and parses the info from the payload, and returns the value to your code - the variable the_text now has the value "My awesome text!".


And - done, the cycle code -> webdriver client -> webdriver server -> browser -> webdriver server -> webdriver client -> code is now complete.


Footnotes:

(*) - this is the actual reason for the dreaded StaleElementReferenceException: all three - the client, the webdriver server, and the browser, hold a refence to an element in the DOM.
But at a particular moment in time, a 3rd party - a javascript code running in the browser, changes/removes the element, blissfully unaware something has a reference it now invalidates (come to think of it, quite an evil act :D).
The next time the client tries to interact with the reference, through the webdriver server, in the browser - the element is no longer there. Naturally, the interaction fails, the failure goes back upstream to the client and surfaces with the exception; its text message is "Element is no longer attached to the DOM" - which being a bit cryptic makes perfect sense now, hopefully.

这篇关于selenium 浏览器自动化中的执行流程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆