Docker:将容器与无头Selenium Chromedriver结合使用 [英] Docker: using container with headless Selenium Chromedriver
问题描述
我正在尝试将 peroumal1的"docker-chrome-selenium"容器链接到另一个带有使用Selenium的抓取代码的容器.
I'm trying to link peroumal1's "docker-chrome-selenium" container to another container with scraping code that uses Selenium.
他将其容器暴露于端口4444(Selenium的默认端口),但是我无法从我的刮板容器访问它.这是我的 docker-compose
文件:
He exposes his container to port 4444 (the default for Selenium), but I'm having trouble accessing it from my scraper container. Here's my docker-compose
file:
chromedriver:
image: eperoumalnaik/docker-chrome-selenium:latest
scraper:
build: .
command: python manage.py scrapy crawl general_course_content
volumes:
- .:/code
ports:
- "8000:8000"
links:
- chromedriver
这是我的刮板Dockerfile:
and here's my scraper Dockerfile:
FROM python:2.7
RUN mkdir /code
WORKDIR /code
ADD requirements.txt /code/
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
ADD . /code/
但是,当我尝试从代码中使用Selenium(见下文)时,出现以下错误消息: selenium.common.exceptions.WebDriverException:消息:路径中需要'chromedriver'可执行文件.请查看http://docs.seleniumhq.org/download/#thirdPartyDrivers并在http://code.google.com/p/selenium/wiki/ChromeDriver
中阅读.在Mac OS X上,当我不使用Docker时,我通过下载 chromedriver
二进制并将其添加到路径中,但是我不知道该怎么做.
When I try to use Selenium from my code (see below), however, I get the following error message: selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be available in the path. Please look at http://docs.seleniumhq.org/download/#thirdPartyDrivers and read up at http://code.google.com/p/selenium/wiki/ChromeDriver
. On Mac OS X, when I wasn't using Docker, I fixed this by downloading the chromedriver
binary and adding it to the path, but I don't know what to do here.
driver = webdriver.Chrome()
driver.maximize_window()
driver.get('http://google.com')
driver.close()
I'm also trying to do this with Selenium's official images and, unfortunately, it's not working either (the same error message asking for the chromedriver binary appears).
在Python代码上需要做些什么吗?
Is there something that needs to be done on the Python code?
谢谢!
Update: As @peroumal1 said, the problem was that I wasn't connecting to a remote driver using Selenium. After I did, however, I had connectivity problems (urllib2.URLError: <urlopen error [Errno 111] Connection refused>
) until I modified the IP address that the Selenium driver connects to (when using boot2docker
, you have to connect to the virtual machine's IP instead of your computer's localhost, which you can find by typing boot2docker ip
) and changed the docker-compose
file. This is what I ended up with:
chromedriver:
image: selenium/standalone-chrome
ports:
- "4444:4444"
scraper:
build: .
command: python manage.py scrapy crawl general_course_content
volumes:
- .:/code
ports:
- 8000:8000
links:
- chromedriver
Python代码(我计算机上的 boot2docker
的IP地址为 192.168.59.103
):
And the Python code (boot2docker
's IP address on my computer is 192.168.59.103
):
driver = webdriver.Remote(
command_executor='http://192.168.59.103:4444/wd/hub',
desired_capabilities=DesiredCapabilities.CHROME)
driver.maximize_window()
driver.get('http://google.com')
driver.close()
推荐答案
我认为这里的问题可能不是Docker,而是代码.Selenium映像提供了通过远程Webdriver到Selenium Server的接口,并且提供的代码尝试使用chromedriver直接实例化Chrome浏览器,如果从环境中可以访问chromedriver,则Selenium Python绑定可能会实现这种情况.
I think the issue here might not Docker, but the code. The Selenium images provide a interface to a Selenium Server through remote Webdriver, and the code provided tries to directly instantiate a Chrome browser using chromedriver, a thing that is possible with Selenium Python bindings, provided that chromedriver is accessible from the environment.
使用 查看全文