如何在Docker容器中使用Selenium设置python应用程序 [英] How to set up a python application with selenium in a docker container

查看:105
本文介绍了如何在Docker容器中使用Selenium设置python应用程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在开发一个项目,以python构建网络抓取工具,然后对其进行泊坞化处理,以便可以在任何计算机上运行该应用程序。我已经构建了python应用程序,使用selenium加载了我正在抓取的网页。我不确定如何与网络驱动程序(如geckodriver)一起在Docker中上传项目,以便可以运行它。我是否需要使用该应用程序创建一个容器,并将其链接到另一个硒容器?谢谢您的帮助!

I am currently working on a project to build a web scraper in python, and then dockerize it so that the application can be run on any machine. I have already built the python app, using selenium to load the webpage I am scrapping. I am unsure of how to upload the project in docker along with a web driver (like geckodriver) so that it can be run. Do I need to create a container with the application, and link it to another selenium container? Thanks for any help!

我的代码从我已编译的文本文件中获取了邮政编码列表,并使用这些代码将其抓取到特定位置。地图。抓取数据后,会将数据附加到csv文件。我需要它能够运行应用程序,然后将csv文件输出到主机。

My code takes in a list of zip-codes from a text file I have compiled, and uses these codes to scrape in a particular location on a map. Once it has scraped the data, it appends the data to a csv file. I need it to be able to run the application, and then output the csv file to the host machine.

编辑:我以前从未使用过docker,但是做了一些研究其运作方式。请ELI5

I have never used docker before, but have done some research on how it works. Please ELI5

推荐答案

首先,您需要一个安装了所有软件包的Docker映像。让我们为此创建一个Dockerfile。

First of all you need a Docker Image with all packages installed. Lets create a Dockerfile for this.

FROM ubuntu:bionic

RUN apt-get update && apt-get install -y \
    python3 python3-pip \
    fonts-liberation libappindicator3-1 libasound2 libatk-bridge2.0-0 \
    libnspr4 libnss3 lsb-release xdg-utils libxss1 libdbus-glib-1-2 \
    curl unzip wget \
    xvfb


# install geckodriver and firefox

RUN GECKODRIVER_VERSION=`curl https://github.com/mozilla/geckodriver/releases/latest | grep -Po 'v[0-9]+.[0-9]+.[0-9]+'` && \
    wget https://github.com/mozilla/geckodriver/releases/download/$GECKODRIVER_VERSION/geckodriver-$GECKODRIVER_VERSION-linux64.tar.gz && \
    tar -zxf geckodriver-$GECKODRIVER_VERSION-linux64.tar.gz -C /usr/local/bin && \
    chmod +x /usr/local/bin/geckodriver && \
    rm geckodriver-$GECKODRIVER_VERSION-linux64.tar.gz

RUN FIREFOX_SETUP=firefox-setup.tar.bz2 && \
    apt-get purge firefox && \
    wget -O $FIREFOX_SETUP "https://download.mozilla.org/?product=firefox-latest&os=linux64" && \
    tar xjf $FIREFOX_SETUP -C /opt/ && \
    ln -s /opt/firefox/firefox /usr/bin/firefox && \
    rm $FIREFOX_SETUP


# install chromedriver and google-chrome

RUN CHROMEDRIVER_VERSION=`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE` && \
    wget https://chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip && \
    unzip chromedriver_linux64.zip -d /usr/bin && \
    chmod +x /usr/bin/chromedriver && \
    rm chromedriver_linux64.zip

RUN CHROME_SETUP=google-chrome.deb && \
    wget -O $CHROME_SETUP "https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb" && \
    dpkg -i $CHROME_SETUP && \
    apt-get install -y -f && \
    rm $CHROME_SETUP


# install phantomjs

RUN wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2 && \
    tar -jxf phantomjs-2.1.1-linux-x86_64.tar.bz2 && \
    cp phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/local/bin/phantomjs && \
    rm phantomjs-2.1.1-linux-x86_64.tar.bz2


RUN pip3 install selenium
RUN pip3 install pyvirtualdisplay
RUN pip3 install Selenium-Screenshot

ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONUNBUFFERED=1

ENV APP_HOME /usr/src/app
WORKDIR /$APP_HOME

COPY . $APP_HOME/

CMD tail -f /dev/null
CMD python3 example.py

它将最终运行您的程序。在我的例子中是example.py

It will run your program in the end. In my case it is example.py

现在将example.py与Dockerfile放在同一目录中。下面是Firefox,Chrome和Phantom JS的示例。

Now place the example.py in the same directory as Dockerfile. An example for Firefox, Chrome and Phantom JS is given below.

import os
import logging

from pyvirtualdisplay import Display
from selenium import webdriver

logging.getLogger().setLevel(logging.INFO)

BASE_URL = 'http://www.example.com/'


def chrome_example():
    display = Display(visible=0, size=(800, 600))
    display.start()
    logging.info('Initialized virtual display..')

    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--no-sandbox')

    chrome_options.add_experimental_option('prefs', {
        'download.default_directory': os.getcwd(),
        'download.prompt_for_download': False,
    })
    logging.info('Prepared chrome options..')

    browser = webdriver.Chrome(chrome_options=chrome_options)
    logging.info('Initialized chrome browser..')

    browser.get(BASE_URL)
    logging.info('Accessed %s ..', BASE_URL)

    logging.info('Page title: %s', browser.title)

    browser.quit()
    display.stop()


def firefox_example():
    display = Display(visible=0, size=(800, 600))
    display.start()
    logging.info('Initialized virtual display..')

    firefox_profile = webdriver.FirefoxProfile()
    firefox_profile.set_preference('browser.download.folderList', 2)
    firefox_profile.set_preference('browser.download.manager.showWhenStarting', False)
    firefox_profile.set_preference('browser.download.dir', os.getcwd())
    firefox_profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')

    logging.info('Prepared firefox profile..')

    browser = webdriver.Firefox(firefox_profile=firefox_profile)
    logging.info('Initialized firefox browser..')

    browser.get(BASE_URL)
    logging.info('Accessed %s ..', BASE_URL)

    logging.info('Page title: %s', browser.title)

    browser.quit()
    display.stop()


def phantomjs_example():
    display = Display(visible=0, size=(800, 600))
    display.start()
    logging.info('Initialized virtual display..')

    browser = webdriver.PhantomJS()
    logging.info('Initialized phantomjs browser..')

    browser.get(BASE_URL)
    logging.info('Accessed %s ..', BASE_URL)

    logging.info('Page title: %s', browser.title)

    browser.quit()
    display.stop()




if __name__ == '__main__':
    chrome_example()
    firefox_example()
    phantomjs_example()

最后,我们将创建Docker-compose.yml来简化操作。

In the end we will create Docker-compose.yml to simplify things up.

selenium:
    build: .
    ports:
        - 4000:4000
    volumes:
        - ./data/:/data/
    privileged: true

构建并运行以下命令。

docker-compose build& docker-compose up -d

docker-compose build && docker-compose up -d

您也可以通过docker命令运行它,而无需使用docker-compose

You can also run it through docker command without using docker-compose

docker build -t selenium_docker .
docker run --privileged -p 4000:4000 -d -it selenium_docker 

资料来源:

https://github.com/dimmg/dockselpy

这篇关于如何在Docker容器中使用Selenium设置python应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆