如何执行网络抓取以获取该应用在Google Play中的所有评论? [英] How to perform web scraping to get all the reviews of the an app in Google Play?

查看:199
本文介绍了如何执行网络抓取以获取该应用在Google Play中的所有评论?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我假装能够获得用户在Google Play上有关应用的所有评论.我有他们在其中指示的这段代码通过Google Playstore在R中进行网络抓取.但问题在于您仅获得前40条评论.是否有可能获得该应用程序的所有评论?

I pretend to be able to get all the reviews that users leave on Google Play about the apps. I have this code that they indicated there Web scraping in R through Google playstore . But the problem is that you only get the first 40 reviews. Is there a possibility to get all the comments of the app?

```

#Loading the rvest package
library(rvest)
library(magrittr) # for the '%>%' pipe symbols
library(RSelenium) # to get the loaded html of 

#Specifying the url for desired website to be scraped
url <- 'https://play.google.com/store/apps/details? 
id=com.phonegap.rxpal&hl=en_IN&showAllReviews=true'

# starting local RSelenium (this is the only way to start RSelenium that 
is working for me atm)
selCommand <- wdman::selenium(jvmargs = c("- 
Dwebdriver.chrome.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "firefox")
remDr$open()

# go to website
remDr$navigate(url)

# get page source and save it as an html object with rvest
html_obj <- remDr$getPageSource(header = TRUE)[[1]] %>% read_html()

# 1) name field (assuming that with 'name' you refer to the name of the 
reviewer)
names <- html_obj %>% html_nodes(".kx8XBd .X43Kjb") %>% html_text()

# 2) How much star they got 
stars <- html_obj %>% html_nodes(".kx8XBd .nt2C1d [role='img']") %>% 
html_attr("aria-label")

# 3) review they wrote
reviews <- html_obj %>% html_nodes(".UD7Dzf") %>% html_text()

# create the df with all the info
review_data <- data.frame(names = names, stars = stars, reviews = reviews, 
stringsAsFactors = F)

```

推荐答案

您可以从GooglePlay的网络商店获取所有评论.

You can get all the reviews from the web store of GooglePlay.

如果滚动查看评论,则可以看到XHR请求已发送至:

If you scroll through the reviews, you can see the XHR request is sent to:

https://play.google.com/_/PlayStoreUi/data/batchexecute

使用表单数据:

f.req: [[["rYsCDe","[[\"com.playrix.homescapes\",7]]",null,"55"]]]
at: AK6RGVZ3iNlrXreguWd7VvQCzkyn:1572317616250

以及以下参数:

rpcids=rYsCDe
f.sid=-3951426241423402754
bl=boq_playuiserver_20191023.08_p0
hl=en
authuser=0
soc-app=121
soc-platform=1
soc-device=1
_reqid=839222
rt=c

在使用了不同的参数之后,我发现许多参数是可选的,请求可以简化为:

After playing around with different parameters, I find out many are optional, and the request can be simplified as:

表单数据:

f.req: [[["UsvDTd","[null,null,[2, $sort,[$review_size,null,$page_token]],[$package_name,7]]",null,"generic"]]]

参数:

hl=$review_language

响应是神秘的,但是本质上是剥离了键的JSON数据,类似于 protobuf ,我为响应编写了一个解析器,将其转换为常规的 dict 对象.

The response is cryptic, but it's essentially JSON data with keys stripped, similar to protobuf, I wrote a parser for the response that translate it to regular dict object.

https://gist.github.com/xlrtx/af655f05700eb76bb29aec876493ed90

这篇关于如何执行网络抓取以获取该应用在Google Play中的所有评论?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆