Scrapy 如何处理 Javascript [英] How can Scrapy deal with Javascript

查看:58
本文介绍了Scrapy 如何处理 Javascript的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

蜘蛛供参考:

import scrapy
from scrapy.spiders import Spider
from scrapy.selector import Selector
from script.items import ScriptItem



    class RunSpider(scrapy.Spider):
        name = "run"
        allowed_domains = ["stopitrightnow.com"]
        start_urls = (
            'http://www.stopitrightnow.com/',
        )



        def parse(self, response):


            for widget in response.xpath('//div[@class="shopthepost-widget"]'):
                #print widget.extract()
                item = ScriptItem()
                item['url'] = widget.xpath('.//a/@href').extract()
                url = item['url']
                #print url
                yield item

当我运行它时,终端中的输出如下:

When I run this the output in terminal is as follows:

2015-08-21 14:23:51 [scrapy] DEBUG: Scraped from <200 http://www.stopitrightnow.com/>
{'url': []}
<div class="shopthepost-widget" data-widget-id="708473">
<script type="text/javascript">!function(d,s,id){var e, p = /^http:/.test(d.location) ? 'http' : 'https';if(!d.getElementById(id)) {e = d.createElement(s);e.id = id;e.src = p + '://' + 'widgets.rewardstyle.com' + '/js/shopthepost.js';d.body.appendChild(e);}if(typeof window.__stp === 'object') if(d.readyState === 'complete') {window.__stp.init();}}(document, 'script', 'shopthepost-script');</script><br>

这是html:

<div class="shopthepost-widget" data-widget-id="708473" data-widget-uid="1"><div id="stp-55d44feabd0eb" class="stp-outer stp-no-controls">
    <a class="stp-control stp-left stp-hidden">&lt;</a>
    <div class="stp-inner" style="width: auto">
        <div class="stp-slide" style="left: -0%">
                        <a href="http://rstyle.me/iA-n/zzhv34c_" target="_blank" rel="nofollow" class="stp-product " data-index="0" style="margin: 0 0px 0 0px">
                <span class="stp-help"></span>
                <img src="//images.rewardstyle.com/img?v=2.13&amp;p=n_24878713">
                            </a>
                        <a href="http://rstyle.me/iA-n/zzhvw4c_" target="_blank" rel="nofollow" class="stp-product " data-index="1" style="margin: 0 0px 0 0px">
                <span class="stp-help"></span>
                <img src="//images.rewardstyle.com/img?v=2.13&amp;p=n_24878708">

对我来说,它在尝试激活 Javascript 时似乎遇到了障碍.我知道javascript不能在scrapy中运行,但必须有一种方法可以访问这些链接.我看过硒,但无法处理它.

To me it seems to hit a block when trying to activate the Javascript. I am aware that javascript can not run in scrapy but there must be a way of getting to those links. I have looked at selenium but can not get a handle on it.

欢迎任何帮助.

推荐答案

我已经用 解决了ScrapyJS.

按照官方文档和这个答案中的设置说明进行操作.

Follow the setup instructions in the official documentation and this answer.

这是我用过的测试蜘蛛:

Here is the test spider I've used:

# -*- coding: utf-8 -*-
import scrapy


class TestSpider(scrapy.Spider):
    name = "run"
    allowed_domains = ["stopitrightnow.com"]
    start_urls = (
        'http://www.stopitrightnow.com/',
    )

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(url, meta={
                'splash': {
                    'endpoint': 'render.html',
                    'args': {'wait': 0.5}
                }
            })

    def parse(self, response):
        for widget in response.xpath('//div[@class="shopthepost-widget"]'):
            print widget.xpath('.//a/@href').extract()

这是我在控制台上得到的:

And here is what I've got on the console:

[u'http://rstyle.me/iA-n/7bk8r4c_', u'http://rstyle.me/iA-n/7bk754c_', u'http://rstyle.me/iA-n/6th5d4c_', u'http://rstyle.me/iA-n/7bm3s4c_', u'http://rstyle.me/iA-n/2xeat4c_', u'http://rstyle.me/iA-n/7bi7f4c_', u'http://rstyle.me/iA-n/66abw4c_', u'http://rstyle.me/iA-n/7bm4j4c_']
[u'http://rstyle.me/iA-n/zzhv34c_', u'http://rstyle.me/iA-n/zzhvw4c_', u'http://rstyle.me/iA-n/zwuvk4c_', u'http://rstyle.me/iA-n/zzhvr4c_', u'http://rstyle.me/iA-n/zzh9g4c_', u'http://rstyle.me/iA-n/zzhz54c_', u'http://rstyle.me/iA-n/zwuuy4c_', u'http://rstyle.me/iA-n/zzhx94c_']

这篇关于Scrapy 如何处理 Javascript的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆