Scrapy FormRequest ,尝试发送带有货币更改表单数据的发布请求(FormRequest) [英] Scrapy FormRequest , trying to send a post request (FormRequest) with currency change formdata

查看:39
本文介绍了Scrapy FormRequest ,尝试发送带有货币更改表单数据的发布请求(FormRequest)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试抓取以下网站但是随着货币从左上角设置表单更改为SAR",我尝试发送这样的scrapy请求:

I've been trying to scrapy the following Website but with the currency changed to 'SAR' from the upper left settings form , i tried sending a scrapy request like this:

r = Request(url='https://www.mooda.com/en/', cookies=[{'name': 'currency',
                                        'value': 'SAR',
                                        'domain': '.www.mooda.com',
                                        'path': '/'}, {'name':'country','value':'SA','domain': '.www.mooda.com','path':'/'}],dont_filter=True)

我仍然得到 EG 的价格

and i still get the price as EG

In [10]: response.css('.price').xpath('text()').extract()
Out[10]: 
[u'1,957 EG\xa3',
 u'3,736 EG\xa3',
 u'2,802 EG\xa3',
 u'10,380 EG\xa3',
 u'1,823 EG\xa3']

我还尝试使用指定的表单数据发送发布请求像这样:

i have also tried to send a post request with the Specified form data like this :

from scrapy.http.request.form import FormRequest
url = 'https://www.mooda.com/en/'
r = FormRequest(url=url,formdata={'selectCurrency':'https://www.mooda.com/en/directory/currency/switch/currency/SAR/uenc/aHR0cHM6Ly93d3cubW9vZGEuY29tL2VuLw,,/'})
fetch(r)

仍然它永远不会工作,也尝试使用 FormRequest.from_response() 但它永远不会工作,我真的很喜欢一些建议,我是爬虫表单请求的新手,如果有人能帮忙,我会很感激

still it would never work ,also tried to use FormRequest.from_response() but it would never work , id really like some advices ,im new to scrapy form requests , if anyone could help , i'd be thankful

推荐答案

都是关于 frontend cookie,我会先向你展示如何处理请求,逻辑将完全相同与 Scrapy 相同:

It is all about the frontend cookie, I will show you how to do it with requests first, the logic will be exactly the same with Scrapy:

head = {        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
#
import requests
from bs4 import BeautifulSoup

with requests.Session() as s:
    soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content)
    r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
    r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
    soup2 = BeautifulSoup(r.content)
    print(soup2.select_one(".price").text)

您需要向带有 id selectCurrency 的选项下的 url 发出请求,然后将请求时返回的 cookie 传递给 https://www.mooda.com/en?currency=sar.没有帖子,都是 get 请求,但是 get 中的 frontend cookie 是必不可少的.

You need to make a requests to the url under the option with the id selectCurrency, you then pass the cookies returned when you make your request to https://www.mooda.com/en?currency=sar. There are no posts, it is all get requests but the frontend cookie from the get is essential.

如果我们运行代码,您会看到它确实为我们提供了正确的数据:

If we run the code, you see it does give us the correct data:

In [9]: with requests.Session() as s:
   ...:         soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content,"lxml")
   ...:         r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
   ...:         r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
   ...:         soup2 = BeautifulSoup(r.content,"lxml")
   ...:         print(soup2.select_one(".price").text)
   ...:     

825 SR

使用scrapy:

class S(Spider):
    name = "foo"
    allowed_domains = ["www.mooda.com"]
    start_urls = ["https://www.mooda.com/en"]

    def parse(self, resp):
        curr = resp.css("#selectCurrency option[value*='SAR']::attr(value)").extract_first()
        return Request(curr, callback=self.parse2)

    def parse2(self, resp):
        print( resp.headers.getlist('Set-Cookie'))
        return Request("https://www.mooda.com/en?currency=sar",cookies=cookies, callback=self.parse3)

    def parse3(self, resp):
        print(resp.css('.price').xpath('text()').extract())

如果你跑步会给你:

['frontend=c95er9h1at2srhtqu5rkfo13g0; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com', 'currency=SAR; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com']


[u'825 SR', u'1,575 SR', u'1,181 SR', u'4,377 SR', u'769 SR']

获取 curr 什么都不返回,它只是设置了 cookie

The get to curr returns nothing, it just sets the cookie

这篇关于Scrapy FormRequest ,尝试发送带有货币更改表单数据的发布请求(FormRequest)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆