将嵌套的JSON对象转换为R中的数据帧 [英] Convert Nested JSON Object into data Frame in R

查看:724
本文介绍了将嵌套的JSON对象转换为R中的数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从Twitter API获取数据.将数据从JSON对象转换为数据框架并加载到数据仓库中.在下面找到输入和代码段.

I am fetching the data from Twitter API. Converting a Data from JSON object to Data Frame and load into Data Warehouse. Find below input and code snippet.

我对R编程非常陌生.

I am very new to R Programming.

 stats_campaign.data <- content(stats_campaign.request)
 print(stats_campaign.data)

O/P:

`{
 "data_type": [ "stats" ],
 "time_series_length": [ 1 ],
 "data": [
 {
  "id": [ "XXXXX" ],
  "id_data": [
    {
      "segment": {},
      "metrics": {
        "impressions": {},
        "tweets_send": {},
        "qualified_impressions": {},
        "follows": {},
        "app_clicks": {},
        "retweets": {},
        "likes": {},
        "engagements": {},
        "clicks": {},
        "card_engagements": {},
        "replies": {},
        "url_clicks": {},
        "carousel_swipes": {}
      }
    }
   ]
   },

   {      
   "id": [ "XXXX1" ],
   "id_data": [
    {
      "segment": {},
      "metrics": {
        "impressions": {},
        "tweets_send": {},
        "qualified_impressions": {},
        "follows": {},
        "app_clicks": {},
        "retweets": {},
        "likes": {},
        "engagements": {},
        "clicks": {},
        "card_engagements": {},
        "replies": {},
        "url_clicks": {},
        "carousel_swipes": {}
      }
    }
    ]
    },`

当我读取此JSON值时,

When I am reading this JSON value ,

    stats_json_file <- sprintf("P:/R Repos/R    
               Applications/TwitterAPIData/stats_test_data-%s.json", TODAY)
    jsonlite::fromJSON(stats_json_file)

   **Result :**
       id                                      id_data
    1  5wcaz                                         NULL
    2  5ub2u                                         NULL
    3  5wb8x                                         NULL
    4  5wb1j                                         NULL
    5  5yqwj                                         NULL
    6  5pq5i                                         NULL
    7  5u197                                         NULL
    8  5z2js                                         NULL
    9  6fqh0   333250, 4, 9, 19, 111, 3189, 3156, 5, 1091
    10 5tvr1                                         NULL
    11 5yqw4                                         NULL
    12 5qqps                                         NULL
    13 5yqvw                                         NULL
    14 5ygom                                         NULL
    15 5nc88                                         NULL
    16 5yg94                                         NULL
    17 65t9e                                         NULL
    18 5peck                                         NULL
    19 63pg1 247283, 17, 22, 35, 297, 5514, 5450, 6, 2971
    20 6cdvy        156705, 1, 2, 6, 112, 10933, 605, 170

   From my JSON file I want Id and whole "metrics": {
        "impressions": {},
        "tweets_send": {},
        "qualified_impressions": {},
        "follows": {},
        "app_clicks": {},
        "retweets": {},
        "likes": {},
        "engagements": {},
        "clicks": {},
        "card_engagements": {},
        "replies": {},
        "url_clicks": {},
        "carousel_swipes": {}
      }
      and convert to Data Frame to load into Data Base. Plzz Help..!

如何解析此JSON对象.我想检索ID&整个Metrics对象.然后要转换为数据框以加载到SQL表中.

How can I parsed this JSON Object. I want to retrieve Id & whole Metrics object. Then want to convert into Data Frame to load into SQL Table.

读取多个ID的&我在下面的代码中使用的指标值,

To read the multiple Id's & Metrics value I used below code,

`test <- list()
 for(i in 1:len)
 { test <- unlist(stats_campaign.data$data[[i]])
 print(test)}`

 **Output:**
      id 
  "5wcaz" 
      id 
   "5ub2u" 
      id 
  "5wb8x" 
      id 
 "5wb1j" 
      id 
 "5yqwj" 
      id 
  "5pq5i" 
      id 
  "5u197" 
      id 
  "5z2js" 
      id 
  "5tvr1" 
      id 
  "5yqw4" 
      id 
  "5qqps" 
      id 
  "5yqvw" 
      id 
  "5ygom" 
      id 
  "5nc88" 
      id 
  "5yg94" 
      id 
  "65t9e" 
      id 
  "5peck" 
                     id id_data.metrics.impressions 
                   "63pg1"                    "133227" 
                      id_data.metrics.tweets_send     id_data.metrics.follows 
                   "10"                         "9" 
                      id_data.metrics.retweets       id_data.metrics.likes 
                   "17"                        "96" 
                    id_data.metrics.engagements      id_data.metrics.clicks 
                 "2165"                      "2134" 
                    id_data.metrics.replies  id_data.metrics.url_clicks 
                    "5"                      "1204" 
                     id id_data.metrics.impressions 
                "6cdvy"                    "176164" 
     id_data.metrics.tweets_send    id_data.metrics.retweets 
                    "2"                        "10" 
    id_data.metrics.likes id_data.metrics.engagements 
                  "121"                      "9708" 
    id_data.metrics.clicks  id_data.metrics.url_clicks 
                  "620"                       "160"

在a内,我每次都必须使用列表或其他内容来附加值,我该怎么做..?我在使用正确的方法吗?有没有其他方法可以解析嵌套的JSON对象并将其直接放入数据帧中?.

Within a for I have to used list or something else to append the value each time, how can I do that ..?? Am I using a right Approach.?? Is there any alternative way I can parsed nested JSON object and directly put into Data Frame..?

请帮助..!提前谢谢..!

Please Help..! Thanks In Advance..!

推荐答案

如评论中所述,有关您要查找的输出的更多信息会有所帮助.无论如何,我希望以下内容可以提供有益的指导. tidyjson自述文件提供了一些有用的概述.

As mentioned in the comments, a bit more information about what output you are looking for would be helpful. In any case, I am hopeful that the following will provide a helpful direction. The tidyjson README provides a bit of helpful overview.

不幸的是,由于JSON对象中缺少数据,因此很难说明数据中可能存在什么(空对象中会有什么期望),而且我很难确定要查找的Twitter API的哪一部分在. tidyjson使您能够生成一致的data.frame输出,即使没有数据也可以!关键动词是gatherspread,非常类似于tidyr,但具有JSON风格.

Unfortunately, the lack of data in your JSON object makes it difficult to illustrate what might be present in your data (what to expect in the null objects), and I am having difficulty determining what part of the Twitter API you are looking at. tidyjson gives you the ability to produce a consistent data.frame output, even when you have no data, though! The key verbs are gather and spread, much like tidyr, but with JSON flavor.

str <- "{\"data_type\":[\"stats\"],\"time_series_length\":[1],\"data\":[{\"id\":[\"XXXXX\"],\"id_data\":[{\"segment\":{},\"metrics\":{\"impressions\":{},\"tweets_send\":{},\"qualified_impressions\":{},\"follows\":{},\"app_clicks\":{},\"retweets\":{},\"likes\":{},\"engagements\":{},\"clicks\":{},\"card_engagements\":{},\"replies\":{},\"url_clicks\":{},\"carousel_swipes\":{}}}]},{\"id\":[\"XXXX1\"],\"id_data\":[{\"segment\":{},\"metrics\":{\"impressions\":{},\"tweets_send\":{},\"qualified_impressions\":{},\"follows\":{},\"app_clicks\":{},\"retweets\":{},\"likes\":{},\"engagements\":{},\"clicks\":{},\"card_engagements\":{},\"replies\":{},\"url_clicks\":{},\"carousel_swipes\":{}}}]}]} "

library(dplyr)
library(tidyjson)

prep <- as.tbl_json(str) %>% enter_object("data") %>% gather_array("objid")

p1 <- prep %>% enter_object("id") %>% 
  gather_array("idnum") %>% append_values_string("id")

p2 <- prep %>% enter_object("id_data") %>% gather_array("datanum") %>%
enter_object("metrics") %>% 
spread_values(
 impressions = jstring("impressions", "value")
 , tweets_send = jnumber("tweets_send", "somekey")
)

p1 %>% tbl_df() %>% left_join(p2 %>% tbl_df(), by = c("document.id", "objid"))
#> # A tibble: 2 x 7
#>   document.id objid idnum    id datanum impressions tweets_send
#>         <int> <int> <int> <chr>   <int>       <chr>       <dbl>
#> 1           1     1     1 XXXXX       1        <NA>          NA
#> 2           1     2     1 XXXX1       1        <NA>          NA

这篇关于将嵌套的JSON对象转换为R中的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆