网页抓取:使用 cookie 发出 POST 请求 [英] Web scraping: making a POST request with cookies

查看:34
本文介绍了网页抓取:使用 cookie 发出 POST 请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想检索此[预订网站][1]的时间表.我需要在向他们的时间表 JSON 文件发出 POST 请求之前检索/刷新 cookie,否则我会收到会话 ID 错误.

I would like to retrieve the timetable for this [booking site][1]. I need to retrieve/refresh the cookie before making the POST request to their timetable JSON file, otherwise I get a session ID error.

`sessionID: none` and 'errorCode': '620', 'errorDescription': 'Invalid Session Number'

这是我提出的要求:

url = 'https://alilauro-tickets.certusonline.com/php/proxy.php'
s = requests.session()

#Request the timetable website
s.get('https://alilauro-tickets.certusonline.com/')
s.headers.update({'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.3'})

data = {
    'msg': 'TimeTable',
    'req': '{"getAvailability":"Y","getBasicPrice":"Y","getRouteAnalysis":"Y","directOnly":"Y","legs":1,"pax":1,"origin":"BEV","destination":"FOR","tripRequest":[{"tripfrom":"BEV","tripto":"FOR","tripdate":"2020-03-18","tripleg":0}]}'
}

#Request the JSON with timetable data
r = s.post(url, data=data, timeout=20, cookies=s.cookies)

这是我得到的完整回复:

This is the full response I get:

{'SWS_LoginInfo': {'agencyCode': None, 'userCode': None, 'password': None, 'language': 'EN', 'sessionId': None, 'payByCreditCard': None, 'hasCreditLimit': None, 'creditLimit': None, 'ticketCount': None, 'errorCode': '620', 'errorDescription': 'Invalid Session Number'}, 'SWS_TripInfo': {'getAvailability': None, 'getPrices': None, 'getRouteAnalysis': None, 'getRequiredFields': None}, 'VWS_Trips_Trip': [], 'SWS_Parameters': {'VCompanies_companyDetails': [], 'VPorts_portDetails': [], 'VCountries_countryDetails': [], 'VVessels_vesselDetails': [], 'VPassengerTypes_passengerType': [], 'VPassengerClasses_passengerClass': [], 'VPassengerDiscounts_passengerDiscount': [], 'VVehicleTypes_vehicleType': [], 'VVehicleDiscounts_vehicleDiscount': [], 'VServiceTypes_serviceType': [], 'VServiceDiscounts_serviceDiscount': [], 'VPortCombinations_portCombination': [], 'VDeliveryMethods_deliveryMethod': [], 'VDocumentTypes_documentType': [], 'VLoyaltyCardTypes_loyaltyCardType': []}, 'SWS_PriceTotals': {'totalNetFare': None, 'totalTaxes': None, 'totalVat': None, 'totalPrice': None, 'totalFees': None, 'totalFeesTax': None, 'totalPayable': None}, 'SWS_Reservation': {'salesChannel': None, 'bookingReference': None, 'companyReference': None, 'acceptFees': None, 'reservationStatus': None, 'optionDateTime': None, 'issuePrepaid': None, 'leaderFullName': None, 'leaderEmail': None, 'leaderPhone': None, 'totalNetFare': None, 'totalTaxes': None, 'totalVat': None, 'totalPrice': None, 'totalFees': None, 'totalFeesTax': None, 'totalPayable': None, 'refundAmount': None, 'acceptTerms': None, 'deliveryMethod': None, 'deliveryAmount': None, 'deliveryAddress': None, 'deliveryCountry': None, 'zipCode': None, 'acceptShareData': None, 'settled': None}, 'VWS_CancelledTickets_Ticket': [], 'VWS_Tickets_Ticket': [], 'ScardMember': {'id': None, 'surname': None, 'firstname': None, 'languageId': None, 'language': None, 'gender': None, 'documentTypeId': None, 'documentTypeCode': None, 'documentType': None, 'documentNumber': None, 'nationalityId': None, 'nationality': None, 'dateRegistered': None, 'active': None, 'mobile': None, 'phone': None, 'fax': None, 'email': None, 'address': None, 'zipCode': None, 'countryId': None, 'country': None, 'birthDate': None, 'birthPlace': None, 'VCards_loyaltyCard': []}, 'SloyaltyCard': {'id': None, 'loyaltyCardTypeId': None, 'loyaltyCardTypeCode': None, 'loyaltyCardType': None, 'cardNumber': None, 'active': None, 'points': None, 'dateFrom': None, 'dateTo': None}, 'VcardTransactions_cardTransaction': []}

推荐答案

您只需要通过调用 s.cookies 来访问/存储当前会话的 cookie.

You simply need to access/store your cookies of the current session by calling s.cookies.

然后您可以在后续请求中使用它.

Then you may use it in the subsequent requests.

这篇关于网页抓取:使用 cookie 发出 POST 请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆