请求-从基于api的网站获取数据 [英] requests - fetch data from api-based website
问题描述
我想从此网站上获得所有评论.
I want to get all the review from this site.
首先,我使用以下代码:
at first, I use this code:
import requests
from bs4 import BeautifulSoup
r = requests.get(
"https://www.traveloka.com/hotel/singapore/mandarin-orchard-singapore-10602")
data = r.content
soup = BeautifulSoup(data, "html.parser")
reviews = soup.find_all("div", {"class": "reviewText"})
for i in range(len(reviews)):
print(reviews[i].get_text())
但是这样,我只能从第一页获得评论.
But this way, I can only get the reviews from the first page only.
有人说我可以使用相同的requests
模块来为此使用api.我找到了 https://api.traveloka.com/v1/hotel/hotelReviewAggregate的api ,但由于我不知道如何使用通过request payload
方式使用的api,因此无法读取参数.
Some said I could use api for this using the same requests
module. I've found the api which is https://api.traveloka.com/v1/hotel/hotelReviewAggregate but I can't read the parameter because I don't know how to use api which use request payload
way.
因此,我希望使用python或api参数获取所有评论的代码,以获取所有页面或特定页面中特定酒店的评论.
So I'm hoping for a code to get all the review using python or the parameter of api to get the review of specific hotel in all or specific pages.
推荐答案
在网络"标签上查看请求有效负载.在skip:8
和top:8
中有一个部分,当您单击向右箭头以获取下一页评论时,这些数字将增加8.
Look at the request payload at the network tab. There is a part where skip:8
and top:8
and you will see those numbers increment by 8 when you click on the right arrow to get the next page of reviews.
您可以重复该请求并以相同的方式抓取结果
You can duplicate that request and scrape the results the same way
使用chrome打开您的页面,然后点击f12
.转到Network
标签,向下滚动到页面底部,您可以在其中前进到下一批评论.按下向右箭头后,将立即填充网络"标签.找到第二个hotelReviewAggregate
并单击它.在标题选项卡下,您将找到Request Payload
.打开data
字典并找到skip
和top
.进行下一批评论,看看这些数字是如何变化的.您可以模拟此行为以转到其他页面.
Open your page with chrome and hit f12
. Go to Network
tab, scroll down at the bottom of your page where you can advance to the next batch of reviews. As soon as you hit the right arrow the network tab will be populated. Find the second hotelReviewAggregate
and click on it. Under the headers tab you will find Request Payload
. Open the data
dict and find skip
and top
. Advance the next batch of reviews and see how those numbers change. You can simulate this behavior to get to the other pages.
然后,您需要做的是准备您的有效负载,在其中您可以增加值并发出GET
请求,并使用response objects
使用BeautifulSoup抓取数据.
Then what you need to do is to prepare your payload where you can increment the values and make GET
requests and use the response objects
to scrape the data with BeautifulSoup.
请求此处
教程中的快速示例:
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('http://httpbin.org/get', params=payload)
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('http://httpbin.org/get', params=payload)
我不知道为什么人们决定在没有解释的情况下对我的回答给予否定的评价.但是,好吧,如果您发现此功能有用并回答了您的问题,请接受它.
I don't know why people decided to give a negative value to my answer without an explanation. But ohh well, If you find this useful and answers your question, please accept it.
这篇关于请求-从基于api的网站获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!