如何使用scrapy处理escaped_fragment [英] how to deal with escaped_fragment using scrapy

查看：58 发布时间：2021/4/2 19:38:36 python ajax scrapy

本文介绍了如何使用scrapy处理escaped_fragment的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近我用scrapy刮了zoominfo，然后测试了以下网址

recently i used scrapy to scrape zoominfo then i test the below url

http://subscriber.zoominfo.com/zoominfo/#!search/profile/person?personId=521850874&targetid=profile

但是在终端中有些变化，就像这样

but some how in terminal, it changed like this

[scrapy] DEBUG: Crawled (200) <GET http://subscriber.zoominfo.com/zoominfo/?_escaped_fragment_=search%2Fprofile%2Fperson%3FpersonId%3D521850874%26targetid%3Dprofile>

我在 setting.py 中添加了 AJAXCRAWL_ENABLED = True ，但该网址仍然具有 escaped_fragment .我怀疑我没有输入正确的页面.

I have added AJAXCRAWL_ENABLED = True in setting.py but the url still has escaped_fragment. I doubt that i haven't entered the right page i want.

spider.py 代码如下:

The spider.py code is below:

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import scrapy
from scrapy.selector import Selector
from scrapy.http import Request, FormRequest
from tutorial.items import TutorialItem
from scrapy.spiders.init import InitSpider


class LoginSpider(InitSpider):
    name = 'zoominfo'
    login_page = 'https://www.zoominfo.com/login'
    start_urls = [
    'http://subscriber.zoominfo.com/zoominfo/#!search/profile/person?personId=521850874&targetid=profile',
    ]
    headers = {
        "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Encoding":"gzip, deflate, br",
        "Accept-Language":"en-US,en;q=0.5",
        "Connectionc":"keep-alive",
        "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:50.0) Gecko/20100101 Firefox/50.0",
    }   
    def init_request(self):
        return Request(url=self.login_page, callback=self.login)

    def login(self, response):
        print "Preparing Login"
        return FormRequest.from_response(
            response,
            headers=self.headers,
            formdata={
            'task':'save',
            'redirect':'http://subscriber.zoominfo.com/zoominfo/#!search/profile/person?personId=521850874&targetid=profile',
            'username': username, 
            'password': password
        },
            callback=self.after_login,
            dont_filter = True,
        )

    def after_login(self, response):
        if "authentication failed" in response.body:
            self.log("Login unsuccessful")
        else:
            self.log(":Login Successfully")
            self.initialized()
            return Request(url='http://subscriber.zoominfo.com/zoominfo/', callback=self.parse)

    def parse(self, response):
        base_url = 'http://subscriber.zoominfo.com/zoominfo/#!search/profile/person?personId=521850874&targetid=profile'
        sel = Selector(response)
        item = TutorialItem()
        divs = sel.xpath("//div[3]//p").extract()
        item['title'] = sel.xpath("//div[3]")
        print divs
        request = Request(base_url, callback=self.parse)
        yield request

感谢任何人都可以给我提示.

thanks anyone could give me a hint.

如何使用scrapy处理escaped_fragment [英] how to deal with escaped_fragment using scrapy

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何使用scrapy处理escaped_fragment [英] how to deal with escaped_fragment using scrapy

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭