需要帮助来模拟 xhr 请求 [英] needing help to simulate an xhr request
本文介绍了需要帮助来模拟 xhr 请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要使用加载更多按钮"来抓取网站.这是我用 Python 编写的蜘蛛代码:
I need to scrape a website with a "load more button". This is my spider code written in Python:
import scrapy
import json
import requests
import re
from parsel import Selector
from scrapy.selector import Selector
from scrapy.http import HtmlResponse
headers = {
'origin': 'https://www.tayara.tn',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36',
'content-type': 'application/json',
'accept': '*/*',
'referer': 'https://www.tayara.tn/sc/immobilier/bureaux-et-plateaux',
'authority': 'www.tayara.tn',
'dnt': '1',
}
data = '{"query":"query ListingsPage($page: Page, $filter: SearchFilter, $sortBy: SortOrder) {\\n listings: searchAds(page: $page, filter: $filter, sortBy: $sortBy) {\\n items {\\n uuid\\n title\\n price\\n currency\\n thumbnail\\n createdAt\\n category {\\n id\\n name\\n engName\\n __typename\\n }\\n user {\\n uuid\\n displayName\\n avatar(width: 96, height: 96) {\\n url\\n __typename\\n }\\n __typename\\n }\\n __typename\\n }\\n trackingInfo {\\n transactionId\\n listName\\n recommenderId\\n experimentId\\n variantId\\n __typename\\n }\\n totalCount\\n pageInfo {\\n startCursor\\n hasPreviousPage\\n endCursor\\n hasNextPage\\n __typename\\n }\\n __typename\\n }\\n}\\n","variables":{"page":{"count":36,"offset":"cDEwbg==.MjAxOC0xMi0wMlQxMzo1MDoxMlo=.MzY="},"filter":{"queryString":null,"category":"140","regionId":null,"attributeFilters":[]},"sortBy":"CREATED_DESC"},"operationName":"ListingsPage"}'
class Tun(scrapy.Spider):
name="tayaracommercial"
start_urls = [
'https://www.tayara.tn/sc/immobilier/bureaux-et-plateaux'
]
def parse(self, response):
yield Request('https://www.tayara.tn/graphql', method='post', headers=headers, body=data, self.parse_item)
def parse_item(self, response):
source = 'Tayara'
reference = response.url.split('//')[1].split('/')[3]
titre = response.xpath('//h1[@data-name="adview_title"]/text()').extract()
yield{'Source':source, 'Reference':reference, 'Titre':titre}
这是我适度的试验.我知道那是假的.你能纠正我吗?
This is my modest trial. I know that is false. can you correct me please ?
推荐答案
您可以通过以下示例抓取数据:
You can scrape data with following example:
# Importing the dependencies
# This is needed to create a lxml object that uses the css selector
from lxml.etree import fromstring
# The requests library
import requests
class WholeFoodsScraper:
API_url = 'http://www.wholefoodsmarket.com/views/ajax'
scraped_stores = []
def get_stores_info(self, page):
# This is the only data required by the api
# To send back the stores info
data = {
'view_name': 'store_locations_by_state',
'view_display_id': 'state',
'page': page
}
# Making the post request
response = requests.post(self.API_url, data=data)
# The data that we are looking is in the second
# Element of the response and has the key 'data',
# so that is what's returned
return response.json()[1]['data']
这篇关于需要帮助来模拟 xhr 请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文