PhantomJS返回空网页(Python，Selenium) [英] PhantomJS returning empty web page (python, Selenium)

查看：178 发布时间：2020/5/26 19:49:05 python selenium selenium-webdriver phantomjs

本文介绍了PhantomJS返回空网页(Python，Selenium)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尝试通过屏幕抓取网站，而不必在python脚本中启动实际的浏览器实例(使用Selenium).我可以使用Chrome或Firefox来做到这一点-我已经尝试过并且可以使用-但是我想使用PhantomJS，所以它没有头.

Trying to screen scrape a web site without having to launch an actual browser instance in a python script (using Selenium). I can do this with Chrome or Firefox - I've tried it and it works - but I want to use PhantomJS so it's headless.

代码如下:

import sys
import traceback
import time

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
    "(KHTML, like Gecko) Chrome/15.0.87"
)

try:
    # Choose our browser
    browser = webdriver.PhantomJS(desired_capabilities=dcap)
    #browser = webdriver.PhantomJS()
    #browser = webdriver.Firefox()
    #browser = webdriver.Chrome(executable_path="/usr/local/bin/chromedriver")

    # Go to the login page
    browser.get("https://www.whatever.com")

    # For debug, see what we got back
    html_source = browser.page_source
    with open('out.html', 'w') as f:
        f.write(html_source)

    # PROCESS THE PAGE (code removed)

except Exception, e:
    browser.save_screenshot('screenshot.png')
    traceback.print_exc(file=sys.stdout)

finally:
    browser.close()

输出仅为:

<html><head></head><body></body></html>

但是当我使用Chrome或Firefox选项时，它可以正常工作.我以为该网站可能会根据用户代理返回垃圾邮件，因此我尝试伪装成该垃圾邮件.没什么.

But when I use the Chrome or Firefox options, it works fine. I thought maybe the web site was returning junk based on the user agent, so I tried faking that out. No difference.

我想念什么?

已更新:我将尝试将以下代码段更新为最新版本，直到它起作用为止.下面是我目前正在尝试的方法.

UPDATED: I will try to keep the below snippet updated with until it works. What's below is what I'm currently trying.

import sys
import traceback
import time
import re

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support import expected_conditions as EC

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 (KHTML, like Gecko) Chrome/15.0.87")

try:
    # Set up our browser
    browser = webdriver.PhantomJS(desired_capabilities=dcap, service_args=['--ignore-ssl-errors=true'])
    #browser = webdriver.Chrome(executable_path="/usr/local/bin/chromedriver")

    # Go to the login page
    print "getting web page..."
    browser.get("https://www.website.com")

    # Need to wait for the page to load
    timeout = 10
    print "waiting %s seconds..." % timeout
    wait = WebDriverWait(browser, timeout)
    element = wait.until(EC.element_to_be_clickable((By.ID,'the_id')))
    print "done waiting. Response:"

    # Rest of code snipped. Fails as "wait" above.

PhantomJS返回空网页(Python，Selenium) [英] PhantomJS returning empty web page (python, Selenium)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

PhantomJS返回空网页(Python，Selenium) [英] PhantomJS returning empty web page (python, Selenium)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭