在scrapy shell中呈现JS内容的FormRequest [英] FormRequest that renders JS content in scrapy shell

查看:47
本文介绍了在scrapy shell中呈现JS内容的FormRequest的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从这个

解决方案

您发送的请求缺少几个字段,这可能就是您没有收到回复的原因.您填写的字段也与他们在请求中期望的字段不对应.解决这个问题的一个好方法是使用 scrapy 的 from_response (doc),它可以根据表单中的信息为您填充一些字段.

对于这个网站,以下对我有用(使用scrapy shell):

<预><代码>>>>url = https://registers.maryland.gov/RowNetWeb/Estates/frmEstateSearch2.aspx">>>获取(网址)>>>从scrapy导入FormRequest>>>req = FormRequest.from_response(... 回复,... formxpath="//form[@id='form1']", # 指定当前页面的表单...表单数据={... 'cboCountyId': '16', # 你选择的县被转换成数字... 'DateOfFilingFrom': '01-01-2001',... 'cboPartyType': 'Decedent',... 'cmdSearch': '搜索'... },... clickdata={'type': 'submit'},……)>>>获取(请求)

I'm trying to scrape content from this page with the following form data:

I need the County: set to Prince George's and DateOfFilingFrom set to 01-01-2000 so I do the following:

% scrapy shell
In [1]: from scrapy.http import FormRequest                                                                                                                                          

In [2]: request = FormRequest(url='https://registers.maryland.gov/RowNetWeb/Estates/frmEstateSearch2.aspx', formdata={'DateOfFilingFrom': '01-01-2000', 'County:': "Prince George's"})                             

In [3]: response                                                                                                                                                                     

In [4]:    

But it's not working(response is None) plus, the next page looks like the following which is loaded dynamically, I need to know how to be able to access each of the links shown below with the following inspection(as far as I know this might be done using Splash however, I'm not sure how to combine a SplashRequest within a FormRequest and do it all from within scrapy shell for testing purposes. I need to know what I'm doing wrong and how to render the next page(the one that results from the FormRequest shown below)

解决方案

The request you're sending is missing a couple of fields, which is probably why you don't get a response back. The fields you fill in also don't correspond to the fields they are expecting in the request. A good way to deal with this is using scrapy's from_response (doc), which can populate some fields for you already based on the information in the form.

For this website the following worked for me (using scrapy shell):

>>> url = "https://registers.maryland.gov/RowNetWeb/Estates/frmEstateSearch2.aspx"
>>> fetch(url)
>>> from scrapy import FormRequest
>>> req = FormRequest.from_response(
...             response,
...             formxpath="//form[@id='form1']", # specify the form on the current page
...             formdata={
...               'cboCountyId': '16',  # the county you select is converted to a number
...               'DateOfFilingFrom': '01-01-2001',
...               'cboPartyType': 'Decedent',
...               'cmdSearch': 'Search'
...             },
...             clickdata={'type': 'submit'},
...       )
>>> fetch(req)

这篇关于在scrapy shell中呈现JS内容的FormRequest的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆