如何使用R循环JSONP / JSON数据 [英] How to loop - JSONP / JSON data using R

查看:162
本文介绍了如何使用R循环JSONP / JSON数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以为我已经使用 jsonlite & tidyjson 。但是,我注意到只有第一页的数据被解析。请咨询我如何正确解析所有页面。总页数超过1300 - 如果我查看 json 输出,所以我认为数据可用但未正确解析。

I thought I had parsed the data correctly using jsonlite & tidyjson. However, I am noticing that only the data from the first page is being parsed. Please advice how I could parse all the pages correctly. The total number of pages are over 1300 -if I look at the json output, so I think the data is available but not correctly parsed.

注意:我使用了 tidyjson ,但是可以使用 jsonlite 或任何其他库

$ b

Note: I have used tidyjson, but am open to using jsonlite or any other library too.

library(dplyr)
library(tidyjson)
library(jsonlite)

 req <- httr::GET("http://svcs.ebay.com/services/search/FindingService/v1?OPERATION-NAME=findItemsByKeywords&SERVICE-VERSION=1.0.0&SECURITY-APPNAME=xxxxxx&GLOBAL-ID=EBAY-US&RESPONSE-DATA-FORMAT=JSON&callback=_cb_findItemsByKeywords&REST-PAYLOAD&keywords=harry%20potter&paginationInput.entriesPerPage=100")

txt <- content(req, "text")

json <- sub("/**/_cb_findItemsByKeywords(", "", txt, fixed = TRUE)

json <- sub(")$", "", json)

data1 <- json %>% as.tbl_json %>% 

  enter_object("findItemsByKeywordsResponse") %>% gather_array %>%       enter_object("searchResult") %>% gather_array %>%  
  enter_object("item") %>% gather_array %>%
  spread_values(
    ITEMID = jstring("itemId"),
    TITLE = jstring("title")
  ) %>%
  select(ITEMID, TITLE) # select only what is needed

############################################################ 

*Note: "paginationOutput":[{"pageNumber":["1"],"entriesPerPage":["100"],"totalPages":["1393"],"totalEntries":["139269"]}]

* &_ipg=100&_pgn=1"


推荐答案

不需要 tidyjson 。您将需要编写另一个函数/一组调用以获取总页数(超过1,400)以使用以下内容,但这应该是相当简单的。尝试将您的操作区分一些,并且可以使用 httr 的全部功能,当您可以参数化时:

No need for tidyjson. You will need to write another function/set of calls to get the total number of pages (it's over 1,400) to use the following, but that should be fairly straightforward. Try to compartmentalize your operations a bit more and use the full power of httr when you can to parameterize things:

library(dplyr)
library(jsonlite)
library(httr)
library(purrr)

get_pg <- function(i) {

  cat(".") # shows progress

  req <- httr::GET("http://svcs.ebay.com/services/search/FindingService/v1",
                   query=list(`OPERATION-NAME`="findItemsByKeywords",
                              `SERVICE-VERSION`="1.0.0",
                              `SECURITY-APPNAME`="xxxxxxxxxxxxxxxxxxx",
                              `GLOBAL-ID`="EBAY-US",
                              `RESPONSE-DATA-FORMAT`="JSON",
                              `REST-PAYLOAD`="",
                              `keywords`="harry potter",
                              `paginationInput.pageNumber`=i,
                              `paginationInput.entriesPerPage`=100))

  dat <- fromJSON(content(req, as="text", encoding="UTF-8"))

  map_df(dat$findItemsByKeywordsResponse$searchResult[[1]]$item, function(x) {

    data_frame(ITEMID=flatten_chr(x$itemId),
               TITLE=flatten_chr(x$title))

  })

}

# "10" will need to be the max page number. I wasn't about to 
# make 1,400 requests to ebay. I'd probably break them up into 
# sets of 30 or 50 and save off temporary data frames as rdata files
# just so you don't get stuck in a situation where R crashes and you
# have to get all the data again.

srch_dat <- map_df(1:10, get_pg)

srch_dat

## Source: local data frame [1,000 x 2]
## 
##          ITEMID                                                                            TITLE
##           (chr)                                                                            (chr)
## 1  371533364795                 Harry Potter: Complete 8-Film Collection (DVD, 2011, 8-Disc Set)
## 2  331128976689                   HOT New Harry Potter 14.5" Magical Wand Replica Cosplay In Box
## 3  131721213216                 Harry Potter: Complete 8-Film Collection (DVD, 2011, 8-Disc Set)
## 4  171430021529   New Harry Potter Hermione Granger Rotating Time Turner Necklace Gold Hourglass
## 5  261597812013            Harry Potter Time Turner+GOLD Deathly Hallows Charm Pendant necklace 
## 6  111883750466                 Harry Potter: Complete 8-Film Collection (DVD, 2011, 8-Disc Set)
## 7  251947403227                   HOT New Harry Potter 14.5" Magical Wand Replica Cosplay In Box
## 8  351113839731 Marauder's Map Hogwarts Wizarding World Harry Potter Warner Bros LIMITED **NEW**
## 9  171912724869 Harry Potter Time Turner Necklace Hermione Granger Rotating Spins Gold Hourglass
## 10 182024752232  Harry Potter : Complete 8-Film Collection (DVD, 2011, 8-Disc Set) Free Shipping
## ..          ...                                                                              ...

这篇关于如何使用R循环JSONP / JSON数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆