使用 selenium 并行执行 Python [英] Python parallel execution with selenium
问题描述
我对使用 selenium 在 python 中的并行执行感到困惑.似乎有几种方法可以解决,但有些方法似乎已经过时.
I'm confused about parallel execution in python using selenium. There seems to be a few ways to go about it, but some seem out of date.
有一个名为
python-wd-parallel
的 python 模块,它似乎有一些功能可以做到这一点,但它是 2013 年的,现在还有用吗?我还发现了这个例子.
There's a python module called
python-wd-parallel
which seems to have some functionality to do this, but it's from 2013, is this still useful now? I also found this example.
有 concurrent.futures
,这看起来更新了很多,但实现起来并不那么容易.有人有在 selenium 中并行执行的工作示例吗?
There's concurrent.futures
, this seems a lot newer, but not so easy to implement. Anyone have a working example with parallel execution in selenium?
还有只使用线程和执行器来完成工作,但我觉得这会更慢,因为它没有使用所有内核并且仍在以串行形式运行.
There's also using just threads and executors to get the job done, but I feel this will be slower, because it's not using all the cores and is still running in serial formation.
使用 selenium 进行并行执行的最新方法是什么?
What is the latest way to do parallel execution using selenium?
推荐答案
使用 joblib 的 Parallel 模块可以做到这一点,它是一个很棒的并行执行库.
Use joblib's Parallel module to do that, its a great library for parallel execution.
假设我们有一个名为 urls
的 url 列表,我们想要并行截取每个 url 的屏幕截图
Lets say we have a list of urls named urls
and we want to take a screenshot of each one in parallel
首先让我们导入必要的库
First lets import the necessary libraries
from selenium import webdriver
from joblib import Parallel, delayed
现在让我们定义一个将屏幕截图作为 base64 的函数
Now lets define a function that takes a screenshot as base64
def take_screenshot(url):
phantom = webdriver.PhantomJS('/path/to/phantomjs')
phantom.get(url)
screenshot = phantom.get_screenshot_as_base64()
phantom.close()
return screenshot
现在要并行执行,你要做的是
Now to execute that in parallel what you would do is
screenshots = Parallel(n_jobs=-1)(delayed(take_screenshot)(url) for url in urls)
当这一行完成执行时,您将在 screenshots
中看到所有运行的进程的所有数据.
When this line will finish executing, you will have in screenshots
all of the data from all of the processes that ran.
关于并行的说明
Parallel(n_jobs=-1)
意味着使用你可以使用的所有资源delayed(function)(input)
是joblib
为您尝试并行运行的函数创建输入的方式
Parallel(n_jobs=-1)
means use all of the resources you candelayed(function)(input)
isjoblib
's way of creating the input for the function you are trying to run on parallel
可以在 joblib
文档中找到更多信息
More information can be found on the joblib
docs
这篇关于使用 selenium 并行执行 Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!