将缺少值的嵌套列表转换为R中的数据框 [英] Converting nested list with missing values to data frame in R

查看:65
本文介绍了将缺少值的嵌套列表转换为R中的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个存储在列表中的googleway程序包输出的地理编码列表(ggmap地理编码无法使用API​​密钥),该列表的每个元素都包含两个列表.但是,对于找不到结果的地址,列表的结构是不同的,这使我将列表转换为数据框的尝试感到沮丧.

I have a list of geocode output from the googleway package (ggmap geocode wouldn't work with an API key) stored in a list, each element of which contains two lists. However, for addresses in which no result was found, the structure of the list is different, frustrating my attempts to convert the list to a dataframe.

(通过dput()创建)的不丢失"结果的结构如下(忽略乱码,RStudio在控制台中无法正确显示西里尔字母):

The structure of a "non-missing" result (created with dput()) is as follows (ignore the gibberish, RStudio doesn't display Cyrillic correctly in the console):

structure(list(results = structure(list(address_components = list(
    structure(list(long_name = c("11À", "óëèöà Ãîãîëÿ", "Çåëåíîãðàäñêèé àäìèíèñòðàòèâíûé îêðóã", 
    "Çåëåíîãðàä", "Ìîñêâà", "Ìîñêâà", "Ðîññèÿ", "124575"), short_name = c("11À", 
    "óë. Ãîãîëÿ", "Çåëåíîãðàäñêèé àäìèíèñòðàòèâíûé îêðóã", "Çåëåíîãðàä", 
    "Ìîñêâà", "Ìîñêâà", "RU", "124575"), types = list("street_number", 
        "route", c("political", "sublocality", "sublocality_level_1"
        ), c("locality", "political"), c("administrative_area_level_2", 
        "political"), c("administrative_area_level_1", "political"
        ), c("country", "political"), "postal_code")), .Names = c("long_name", 
    "short_name", "types"), class = "data.frame", row.names = c(NA, 
    8L))), formatted_address = "óë. Ãîãîëÿ, 11À, Çåëåíîãðàä, Ìîñêâà, Ðîññèÿ, 124575", 
    geometry = structure(list(location = structure(list(lat = 55.987567, 
        lng = 37.17152), .Names = c("lat", "lng"), class = "data.frame", row.names = 1L), 
        location_type = "ROOFTOP", viewport = structure(list(
            northeast = structure(list(lat = 55.9889159802915, 
                lng = 37.1728689802915), .Names = c("lat", "lng"
            ), class = "data.frame", row.names = 1L), southwest = structure(list(
                lat = 55.9862180197085, lng = 37.1701710197085), .Names = c("lat", 
            "lng"), class = "data.frame", row.names = 1L)), .Names = c("northeast", 
        "southwest"), class = "data.frame", row.names = 1L)), .Names = c("location", 
    "location_type", "viewport"), class = "data.frame", row.names = 1L), 
    place_id = "ChIJzXSgUeQUtUYREIohzQOG--A", types = list("street_address")), .Names = c("address_components", 
"formatted_address", "geometry", "place_id", "types"), class = "data.frame", row.names = 1L), 
    status = "OK"), .Names = c("results", "status"))

缺失"结果的结构如下:

The structure of a "missing" result is as follows:

structure(list(results = list(), status = "ZERO_RESULTS"), .Names = c("results", 
"status"))

基本上,问题似乎在于,当该函数未从Google API获得结果时,它会创建一个空列表,而不是与NA为"non-missing"的列表具有相同元素的列表价值观.当您将这些列表传递给data.frame()时,这会产生错误,因为它无法从零开始创建数据帧.

Basically, the issue appears to be that when the function doesn't get a result from the Google API, it creates an empty list, rather than a list with the same elements as the "non-missing" list with NA as values. This creates an error when you pass it these lists to data.frame(), because it cannot create a data frame from nothing.

在将结果子列表提取到它们自己的列表之后,我在这里尝试了该解决方案:

I have tried the solution here after extracting the results sublists into a list of their own: Converting nested list (unequal length) to data frame. It is supposed to fill in NAs and create equal length lists, enabling a conversion to a data frame:

first100geocode.results.l <- vector("list", 100)
for(i in 1:length(first100geocode.results.l)){
     first100geocode.results.l[[i]] <- first100geocode[[i]]$results
}

indx <- sapply(first100geocode.results.l, length)
res <- as.data.frame(do.call(rbind,lapply(first100geocode.results.l, 
`length<-`, max(indx))))
colnames(res) <- names(first100geocode.results.l[[which.max(indx)]])

但是,在其中创建"res"对象的行会引发错误:Error in rbind(deparse.level, ...) : invalid list argument: all variables should have the same length'.

However, the line in which the "res" object is created throws an error: Error in rbind(deparse.level, ...) : invalid list argument: all variables should have the same length'.

是否还有其他方法可以为缺失的结果填写NA,以便将其转换为数据框?

Is there some other way to fill in NAs for the missing results, so that I can convert this to a data frame?

(注意:我不能只是删除缺失的结果,我需要将此结果绑定回原始地址列表).

(Note: I can't just simply remove the missing results, I need to bind this back to the original list of addresses).

推荐答案

我们将让jsonlite::flatten完成大部分工作:

We'll let jsonlite::flatten do most of the work:

将您的两个示例结果放在一个列表中(希望这对您的实际数据结构是忠实的):

Put your two example results in one list (hopefully this is faithful to your actual data structure):

first100geocode <- list(
  structure(list(results = structure(list(address_components = list(
    structure(list(long_name = c(
      "11À", "óëèöà Ãîãîëÿ", "Çåëåíîãðàäñêèé àäìèíèñòðàòèâíûé îêðóã", 
      "Çåëåíîãðàä", "Ìîñêâà", "Ìîñêâà", "Ðîññèÿ", "124575"), short_name = c(
        "11À", "óë. Ãîãîëÿ", "Çåëåíîãðàäñêèé àäìèíèñòðàòèâíûé îêðóã", "Çåëåíîãðàä", 
        "Ìîñêâà", "Ìîñêâà", "RU", "124575"), types = list(
          "street_number", "route", c("political", "sublocality", "sublocality_level_1"
          ), c("locality", "political"), c(
            "administrative_area_level_2", 
            "political"), c("administrative_area_level_1", "political"
            ), c("country", "political"), "postal_code")), .Names = c(
              "long_name", 
              "short_name", "types"), class = "data.frame", row.names = c(NA, 8L))), 
    formatted_address = "óë. Ãîãîëÿ, 11À, Çåëåíîãðàä, Ìîñêâà, Ðîññèÿ, 124575", 
    geometry = structure(list(location = structure(
      list(lat = 55.987567, lng = 37.17152), 
      .Names = c("lat", "lng"), class = "data.frame", row.names = 1L), 
      location_type = "ROOFTOP", viewport = structure(list(
        northeast = structure(list(
          lat = 55.9889159802915, lng = 37.1728689802915), .Names = c("lat", "lng"
          ), class = "data.frame", row.names = 1L), southwest = structure(list(
            lat = 55.9862180197085, lng = 37.1701710197085), .Names = c("lat", "lng"), 
            class = "data.frame", row.names = 1L)), .Names = c("northeast", "southwest"), 
        class = "data.frame", row.names = 1L)), 
      .Names = c("location", "location_type", "viewport"), 
      class = "data.frame", row.names = 1L), 
    place_id = "ChIJzXSgUeQUtUYREIohzQOG--A", types = list("street_address")), 
    .Names = c("address_components", 
               "formatted_address", "geometry", "place_id", "types"), 
    class = "data.frame", row.names = 1L), 
    status = "OK"), .Names = c("results", "status")),
  structure(list(results = list(), status = "ZERO_RESULTS"), 
            .Names = c("results", "status"))
)

进行实际的展平(并滤除有些难度并且您不感兴趣的address_componentstypes):

Do the actual flattening (and filter out address_components and types that are a bit trickier and of no interest to you):

flatten_googleway <- function(df) {
  res <- jsonlite::flatten(df)
  res[, !names(res) %in% c("address_components", "types")]
}

准备用于缺失"结果的模板数据框.并将其应用于那些:

Prepare the template data frame we'll use for "missing" results. And apply it to those:

template_res <- flatten_googleway(first100geocode[[1]]$results)[FALSE, ]
do.call(rbind, lapply(first100geocode, function(x) {
  if (length(x$results) == 0) template_res[1, ] else flatten_googleway(x$results)
}))
#                                      formatted_address                    place_id
# 1  óë. Ãîãîëÿ, 11À, Çåëåíîãðàä, Ìîñêâà, Ðîññèÿ, 124575 ChIJzXSgUeQUtUYREIohzQOG--A
# NA                                                <NA>                        <NA>
#    geometry.location_type geometry.location.lat geometry.location.lng
# 1                 ROOFTOP              55.98757              37.17152
# NA                   <NA>                    NA                    NA
#    geometry.viewport.northeast.lat geometry.viewport.northeast.lng
# 1                         55.98892                        37.17287
# NA                              NA                              NA
#    geometry.viewport.southwest.lat geometry.viewport.southwest.lng
# 1                         55.98622                        37.17017
# NA                              NA                              NA

这篇关于将缺少值的嵌套列表转换为R中的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆