解析错误:“跟踪垃圾"尝试解析数据框中的JSON列时 [英] Parse Error: "Trailing Garbage" while trying to parse JSON column in data frame

查看:102
本文介绍了解析错误:“跟踪垃圾"尝试解析数据框中的JSON列时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似于的日志文件.

I have a log file that look like this.

我正在尝试通过以下方式解析Message列中的JSON:

I'm trying to parse the JSON in the Message column by:

library(readr)
library(jsonlite)

df <- read_csv("log_file_from_above.csv")
fromJSON(as.character(df$Message))

但是,我遇到了以下错误:

But, I'm hitting the following error:

Error: parse error: trailing garbage
          "isEmailConfirmed": false  } {    "id": -1,    "firstName": 
                     (right here) ------^

如何摆脱尾随垃圾"?

推荐答案

fromJSON()并非针对字符向量应用",而是试图将其全部转换为数据帧.你可以尝试

fromJSON() isn't "apply"ing against the character vector, it's trying to convert it all to a data frame. You can try

purrr::map(df$Message, jsonlite::fromJSON)

@Abdou提供了什么,或者

what @Abdou provided or

jsonlite::stream_in(textConnection(gsub("\\n", "", df$Message)))

后两个将创建数据帧.第一个将创建一个列表,您可以将其添加为列.

The latter two will create data frames. The first will create a list you can add as a column.

您可以将最后一种方法与dplyr::bind_cols一起使用,以使用所有数据制作一个新的数据框:

You can use the last method with dplyr::bind_cols to make a new data frame with all the data:

dplyr::bind_cols(df[,1:3],
                 jsonlite::stream_in(textConnection(gsub("\\n", "", df$Message))))

@Abdou还建议使用几乎是纯R的基本R解决方案:

Also suggested by @Abdou is an almost pure base R solution:

cbind(df, do.call(plyr::rbind.fill, lapply(paste0("[",df$Message,"]"), function(x) jsonlite::fromJSON(x))))

完整,有效的工作流程:

Full, working, workflow:

library(dplyr)
library(jsonlite)

df <- read.table("http://pastebin.com/raw/MMPMwNZv",
                 quote='"', sep=",", stringsAsFactors=FALSE, header=TRUE)

bind_cols(df[,1:3], stream_in(textConnection(gsub("\\n", "", df$Message)))) %>%
  glimpse()
## 
 Found 3 records...
 Imported 3 records. Simplifying into dataframe...
## Observations: 3
## Variables: 19
## $ Id                  <int> 35054, 35055, 35059
## $ Date                <chr> "2016-06-17 19:29:43 +0000", "2016-06-17 1...
## $ Level               <chr> "INFO", "INFO", "INFO"
## $ id                  <int> -2, -1, -3
## $ ipAddress           <chr> "100.100.100.100", NA, "100.200.300.400"
## $ howYouHearAboutUs   <chr> NA, "Radio", NA
## $ isInterestedInOffer <lgl> TRUE, FALSE, TRUE
## $ incomeRange         <int> 60000, 1, 100000
## $ isEmailConfirmed    <lgl> FALSE, NA, TRUE
## $ firstName           <chr> NA, "John", NA
## $ lastName            <chr> NA, "Smith", NA
## $ email               <chr> NA, "john.smith@gmail.com", NA
## $ city                <chr> NA, "Smalltown", NA
## $ birthDate           <chr> NA, "1999-12-10T05:00:00Z", NA
## $ password            <chr> NA, "*********", NA
## $ agreeToTermsOfUse   <lgl> NA, TRUE, TRUE
## $ visitUrl            <chr> NA, NA, "https://www.website.com/?purpose=X"
## $ isIdentityConfirmed <lgl> NA, NA, FALSE
## $ validationResults   <lgl> NA, NA, NA

这篇关于解析错误:“跟踪垃圾"尝试解析数据框中的JSON列时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆