如何使用Python在Selenium中同时运行多个Webdriver进程? [英] How can I run multiple webdriver processes concurrently in Selenium using Python?
问题描述
我有成千上万个URL的列表.我想使用Python/Selenuim进行以下操作:
I have a list of thousands of URLs. I want to use Python/Selenuim to:
- 加载每个URL,
- 选择一个元素
- 关闭页面
为了使其运行更快,我想并行运行许多这样的进程,但是一次只能解决一个问题.
To make it run faster, I want to run lots of these processes in parallel, but I can only work out how to do it one at a time.
from selenium import webdriver
driver = webdriver.Chrome()
url_list = [
'https://www.instagram.com/p/Bj7NmpqBuSw/?tagged=style',
'https://www.instagram.com/p/Bj7Nic3Au85/?tagged=style'
]
for url in url_list:
driver.get(url)
driver.find_elements_by_class_name("class-name-for-profile-link")
driver.close()
我尝试使用许多浏览器标签
I tried using lots of browser tabs
driver.switch_to.window(driver.window_handles[1])
但是手柄处理起来有些棘手.
but the handles are a bit tricky to manage.
如何并行运行此过程?
推荐答案
tl;dr I created this gist to give an easy example of how to run simple Selenium tasks in parallel. You can adapt it to your own purposes.
并行化Selenium脚本的问题是Selenium工人本身就是进程.上面的脚本使用了两个 FIFO队列,其中一个存储空闲的Selenium worker的ID.还有一个存储数据以传递给工作人员的工具.后台主线程在这两个队列之间进行侦听,并将传入的数据分配给空闲的工作程序,从而在工作程序执行工作时将硒工作程序的ID移出工作程序队列.
The issue with parallelising Selenium scripts is that Selenium workers are themselves processes. The above script uses two FIFO queues, one that stores the IDs of idle Selenium workers and one that stores data to pass to the workers. Background master threads listen across both these queues and assign incoming data to idle workers, taking the ID of selenium workers off the worker queue while the worker does its work.
All you would need to do to adapt the code to your purposes is to change the code in the function selenium_task
. Hope this helps!
这篇关于如何使用Python在Selenium中同时运行多个Webdriver进程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!