请求-从基于api的网站获取数据 [英] requests - fetch data from api-based website

查看:191
本文介绍了请求-从基于api的网站获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从此网站上获得所有评论.

I want to get all the review from this site.

首先,我使用以下代码:

at first, I use this code:

import requests
from bs4 import BeautifulSoup

r = requests.get(
    "https://www.traveloka.com/hotel/singapore/mandarin-orchard-singapore-10602")

data = r.content
soup = BeautifulSoup(data, "html.parser")
reviews = soup.find_all("div", {"class": "reviewText"})

for i in range(len(reviews)):
    print(reviews[i].get_text())

但是这样,我只能从第一页获得评论.

But this way, I can only get the reviews from the first page only.

有人说我可以使用相同的requests模块来为此使用api.我找到了 https://api.traveloka.com/v1/hotel/hotelReviewAggregate的api ,但由于我不知道如何使用通过request payload方式使用的api,因此无法读取参数.

Some said I could use api for this using the same requests module. I've found the api which is https://api.traveloka.com/v1/hotel/hotelReviewAggregate but I can't read the parameter because I don't know how to use api which use request payload way.

因此,我希望使用python或api参数获取所有评论的代码,以获取所有页面或特定页面中特定酒店的评论.

So I'm hoping for a code to get all the review using python or the parameter of api to get the review of specific hotel in all or specific pages.

推荐答案

在网络"标签上查看请求有效负载.在skip:8top:8中有一个部分,当您单击向右箭头以获取下一页评论时,这些数字将增加8.

Look at the request payload at the network tab. There is a part where skip:8 and top:8 and you will see those numbers increment by 8 when you click on the right arrow to get the next page of reviews.

您可以重复该请求并以相同的方式抓取结果

You can duplicate that request and scrape the results the same way

使用chrome打开您的页面,然后点击f12.转到Network标签,向下滚动到页面底部,您可以在其中前进到下一批评论.按下向右箭头后,将立即填充网络"标签.找到第二个hotelReviewAggregate并单击它.在标题选项卡下,您将找到Request Payload.打开data字典并找到skiptop.进行下一批评论,看看这些数字是如何变化的.您可以模拟此行为以转到其他页面.

Open your page with chrome and hit f12. Go to Network tab, scroll down at the bottom of your page where you can advance to the next batch of reviews. As soon as you hit the right arrow the network tab will be populated. Find the second hotelReviewAggregate and click on it. Under the headers tab you will find Request Payload. Open the data dict and find skip and top. Advance the next batch of reviews and see how those numbers change. You can simulate this behavior to get to the other pages.

然后,您需要做的是准备您的有效负载,在其中您可以增加值并发出GET请求,并使用response objects使用BeautifulSoup抓取数据.

Then what you need to do is to prepare your payload where you can increment the values and make GET requests and use the response objects to scrape the data with BeautifulSoup.

请求此处

教程中的快速示例:

payload = {'key1': 'value1', 'key2': 'value2'} r = requests.get('http://httpbin.org/get', params=payload)

payload = {'key1': 'value1', 'key2': 'value2'} r = requests.get('http://httpbin.org/get', params=payload)

我不知道为什么人们决定在没有解释的情况下对我的回答给予否定的评价.但是,好吧,如果您发现此功能有用并回答了您的问题,请接受它.

I don't know why people decided to give a negative value to my answer without an explanation. But ohh well, If you find this useful and answers your question, please accept it.

这篇关于请求-从基于api的网站获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆