Python与Selenium并行执行 [英] Python parallel execution with selenium

查看:390
本文介绍了Python与Selenium并行执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用硒的python中的并行执行感到困惑.似乎有一些解决方法,但是有些似乎已经过时了.

I'm confused about parallel execution in python using selenium. There seems to be a few ways to go about it, but some seem out of date.

我想知道使用硒进行并行执行的最新方法是什么?

I'm wondering what is the latest way to do parallel execution using selenium?

有一个名为python-wd-parallel的python模块,似乎有一些功能可以做到这一点,但这是从2013年开始的,现在仍然有用吗?

There's a python module called python-wd-parallel which seems to have some functionality to do this, but it's from 2013, is this still useful now?

例如 https://saucelabs.com /blog/parallel-testing-with-python-and-selenium-on-sauce-online-workshop-recap

我们还具有并发功能.这似乎更新了很多,但实现起来却不那么容易-任何人都可以在硒中并行执行一个有效的示例吗?

We have concurrent.future also, this seems a lot newer, but not so easy to implement - anyone have a working example with parallel execution in selenium?

这里也仅使用线程和执行程序来完成工作,但是我觉得这样做会比较慢,因为它没有使用所有内核,并且仍以串行形式运行.

There's also using just threading and executors to get the job done, but I feel this will be slower, because it's not using all the cores and is still running in serial formation.

推荐答案

使用 joblib的并行模块来做到这一点,它是一个出色的并行执行库.

Use joblib's Parallel module to do that, its a great library for parallel execution.

让我们说我们有一个名为urls的URL列表,我们想同时为每个URL截屏

Lets say we have a list of urls named urls and we want to take a screenshot of each one in parallel

首先让我们导入必要的库

First lets import the necessary libraries

from selenium import webdriver
from joblib import Parallel, delayed

现在让我们定义一个将屏幕截图作为base64的函数

Now lets define a function that takes a screenshot as base64

def take_screenshot(url):
    phantom = webdriver.PhantomJS('/path/to/phantomjs')
    phantom.get(url)
    screenshot = phantom.get_screenshot_as_base64()
    phantom.close()

    return screenshot

现在要并行执行该操作,

Now to execute that in parallel what you would do is

screenshots = Parallel(n_jobs=-1)(delayed(take_screenshot)(url) for url in urls)

该行结束执行时,您将在screenshots中拥有来自所有运行进程的所有数据.

When this line will finish executing, you will have in screenshots all of the data from all of the processes that ran.

关于并行的解释

  • Parallel(n_jobs=-1)意味着使用您可以使用的所有资源
  • delayed(function)(input)joblib为尝试并行运行的功能创建输入的方法
  • Parallel(n_jobs=-1) means use all of the resources you can
  • delayed(function)(input) is joblib's way of creating the input for the function you are trying to run on parallel

更多信息,请参见joblib文档

这篇关于Python与Selenium并行执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆