使用 selenium 并行执行 Python [英] Python parallel execution with selenium

查看:23
本文介绍了使用 selenium 并行执行 Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用 selenium 在 python 中的并行执行感到困惑.似乎有几种方法可以解决,但有些方法似乎已经过时.

I'm confused about parallel execution in python using selenium. There seems to be a few ways to go about it, but some seem out of date.

  1. 有一个名为 python-wd-parallel 的 python 模块,它似乎有一些功能可以做到这一点,但它是 2013 年的,现在还有用吗?我还发现了这个例子.

  1. There's a python module called python-wd-parallel which seems to have some functionality to do this, but it's from 2013, is this still useful now? I also found this example.

concurrent.futures,这看起来更新了很多,但实现起来并不那么容易.有人有在 selenium 中并行执行的工作示例吗?

There's concurrent.futures, this seems a lot newer, but not so easy to implement. Anyone have a working example with parallel execution in selenium?

还有只使用线程和执行器来完成工作,但我觉得这会更慢,因为它没有使用所有内核并且仍在以串行形式运行.

There's also using just threads and executors to get the job done, but I feel this will be slower, because it's not using all the cores and is still running in serial formation.

使用 selenium 进行并行执行的最新方法是什么?

What is the latest way to do parallel execution using selenium?

推荐答案

使用 joblib 的 Parallel 模块可以做到这一点,它是一个很棒的并行执行库.

Use joblib's Parallel module to do that, its a great library for parallel execution.

假设我们有一个名为 urls 的 url 列表,我们想要并行截取每个 url 的屏幕截图

Lets say we have a list of urls named urls and we want to take a screenshot of each one in parallel

首先让我们导入必要的库

First lets import the necessary libraries

from selenium import webdriver
from joblib import Parallel, delayed

现在让我们定义一个将屏幕截图作为 base64 的函数

Now lets define a function that takes a screenshot as base64

def take_screenshot(url):
    phantom = webdriver.PhantomJS('/path/to/phantomjs')
    phantom.get(url)
    screenshot = phantom.get_screenshot_as_base64()
    phantom.close()

    return screenshot

现在要并行执行,你要做的是

Now to execute that in parallel what you would do is

screenshots = Parallel(n_jobs=-1)(delayed(take_screenshot)(url) for url in urls)

当这一行完成执行时,您将在 screenshots 中看到所有运行的进程的所有数据.

When this line will finish executing, you will have in screenshots all of the data from all of the processes that ran.

关于并行的说明

  • Parallel(n_jobs=-1) 意味着使用你可以使用的所有资源
  • delayed(function)(input)joblib 为您尝试并行运行的函数创建输入的方式
  • Parallel(n_jobs=-1) means use all of the resources you can
  • delayed(function)(input) is joblib's way of creating the input for the function you are trying to run on parallel

可以在 joblib 文档中找到更多信息

More information can be found on the joblib docs

这篇关于使用 selenium 并行执行 Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆