使用 Python 和 Selenium Webdriver 抓取 javascript [英] Scraping javascript with Python and Selenium Webdriver

查看：69 发布时间：2021/6/26 19:59:09 javascript python python-2.7 selenium web-scraping

本文介绍了使用 Python 和 Selenium Webdriver 抓取 javascript的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从 Ask 中抓取广告，这些广告是由 Google 托管的 JS 在 iframe 中生成的.

I'm trying to scrape the ads from Ask, which are generated in an iframe by a JS hosted by Google.

当我手动导航并查看源代码时，它们就在那里(我特意寻找 ID 为adBlock"的 div，它位于 iframe 中).

When I manually navigate my way through, and view source, there they are (I'm specifically looking for a div with the id "adBlock", which is in an iframe).

但是当我尝试使用 Firefox、Chromedriver 或 FirefoxPortable 时，返回给我的源缺少我正在寻找的所有元素.

But when I try using Firefox, Chromedriver or FirefoxPortable, the source returned to me is missing all of the elements I'm looking for.

我尝试使用 urllib2 进行抓取并得到相同的结果，即使添加了必要的标头也是如此.我认为像 Webdriver 创建的物理浏览器实例肯定会解决这个问题.

I tried scraping with urllib2 and had the same results, even when adding in the necessary headers. I thought for sure that a physical browser instance like Webdriver creates would have fixed that problem.

这是我正在处理的代码，必须从几个不同的来源拼凑而成:

Here's the code I'm working off of, which had to be cobbled together from a few different sources:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pprint

# Create a new instance of the Firefox driver
driver = webdriver.Chrome('C:\Python27\Chromedriver\chromedriver.exe')
driver.get("http://www.ask.com")

print driver.title
inputElement = driver.find_element_by_name("q")

# type in the search
inputElement.send_keys("baseball hats")
# submit the form (although google automatically searches now without submitting)
inputElement.submit()

try:
    WebDriverWait(driver, 10).until(EC.title_contains("baseball"))
    print driver.title
    output = driver.page_source
    print(output)
finally:
    driver.quit()

我知道我通过一些不同的尝试来查看源代码，这不是我所关心的.

I know I circle through a few different attempts at viewing the source, that's not what I'm concerned about.

有没有想过为什么我从这个脚本中得到一个结果(省略了广告)，而从它打开的浏览器得到了一个完全不同的结果(存在广告)?我尝试过 Scrapy、Selenium、Urllib2 等.不高兴.

Any thoughts as to why I'm getting one result from this script (ads omitted) and a totally different result (ads present) from the browser it opened in? I've tried Scrapy, Selenium, Urllib2, etc. No joy.

使用 Python 和 Selenium Webdriver 抓取 javascript [英] Scraping javascript with Python and Selenium Webdriver

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用 Python 和 Selenium Webdriver 抓取 javascript [英] Scraping javascript with Python and Selenium Webdriver

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭