requests - 从基于 api 的网站获取数据 [英] requests - fetch data from api-based website

查看:23
本文介绍了requests - 从基于 api 的网站获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从这个网站获得所有评论.

起初,我使用这个代码:

at first, I use this code:

import requests
from bs4 import BeautifulSoup

r = requests.get(
    "https://www.traveloka.com/hotel/singapore/mandarin-orchard-singapore-10602")

data = r.content
soup = BeautifulSoup(data, "html.parser")
reviews = soup.find_all("div", {"class": "reviewText"})

for i in range(len(reviews)):
    print(reviews[i].get_text())

但是这样,我只能从第一页获得评论.

But this way, I can only get the reviews from the first page only.

有人说我可以使用相同的 requests 模块来使用 api.我找到了 https://api.traveloka.com/v1/hotel/hotelReviewAggregate 的 api 但我无法读取参数,因为我不知道如何使用使用 request payload 方式的 api.

Some said I could use api for this using the same requests module. I've found the api which is https://api.traveloka.com/v1/hotel/hotelReviewAggregate but I can't read the parameter because I don't know how to use api which use request payload way.

所以我希望有一个代码可以使用python或api的参数获取所有评论,以获取所有或特定页面中特定酒店的评论.

So I'm hoping for a code to get all the review using python or the parameter of api to get the review of specific hotel in all or specific pages.

推荐答案

在网络选项卡中查看请求负载.有一部分是 skip:8top:8,当您单击向右箭头以获取下一页评论时,您会看到这些数字增加了 8.

Look at the request payload at the network tab. There is a part where skip:8 and top:8 and you will see those numbers increment by 8 when you click on the right arrow to get the next page of reviews.

您可以复制该请求并以相同的方式抓取结果

You can duplicate that request and scrape the results the same way

使用 chrome 打开您的页面并点击 f12.转到 Network 标签,在页面底部向下滚动,您可以前进到下一批评论.点击右箭头后,网络选项卡将被填充.找到第二个 hotelReviewAggregate 并点击它.在标题选项卡下,您将找到 Request Payload.打开data dict,找到skiptop.推进下一批评论,看看这些数字如何变化.您可以模拟此行为以访问其他页面.

Open your page with chrome and hit f12. Go to Network tab, scroll down at the bottom of your page where you can advance to the next batch of reviews. As soon as you hit the right arrow the network tab will be populated. Find the second hotelReviewAggregate and click on it. Under the headers tab you will find Request Payload. Open the data dict and find skip and top. Advance the next batch of reviews and see how those numbers change. You can simulate this behavior to get to the other pages.

然后您需要做的是准备您的有效负载,您可以在其中增加值并发出 GET 请求,并使用 响应对象 使用 BeautifulSoup 抓取数据.

Then what you need to do is to prepare your payload where you can increment the values and make GET requests and use the response objects to scrape the data with BeautifulSoup.

请求此处

教程中的快速示例:

payload = {'key1': 'value1', 'key2': 'value2'}r = requests.get('http://httpbin.org/get', params=payload)

我不知道为什么人们决定在没有解释的情况下给我的答案一个负面的价值.但是哦,好吧,如果您觉得这有用并回答了您的问题,请接受它.

I don't know why people decided to give a negative value to my answer without an explanation. But ohh well, If you find this useful and answers your question, please accept it.

这篇关于requests - 从基于 api 的网站获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆