在读取不带分隔符的文本文件时将日期强制为新行 [英] Force date as new line on reading non-delimited text file

查看:47
本文介绍了在读取不带分隔符的文本文件时将日期强制为新行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试阅读并使用格式异常的调试日志.没有一致的分隔符,也似乎没有对换行符进行编码.

I am trying to read in and work with a horribly formatted debug log. There are no consistent delimeters and it does not appear line breaks are encoded either.

我想做的是读入并解析数据以使每个日期都有新行(YYYY-MM-DD格式).

What I'd like to do is read in and parse the data to have a new line for each date (YYYY-MM-DD format).

我正在尝试在 tidyverse 中工作,但似乎无法获得可以正确解析文件的内容.

I am trying to work within the tidyverse but cannot seem to get something that will parse the file correctly.

是否有一种方法可以强制用日期模式来分隔行?

Is there a way to force lines to be delimited by a date pattern?

这些工作都没有:

library(tidyverse)
Log_File <- read.table("Example.txt", header = F, fill = T, skip = 1, allowEscapes = TRUE)
Log_File <- read_delim("Example.txt", col_names = F, delim = " ", n_max = 2)
Log_File <- read_lines("Example.txt", skip = 1, n_max = -1L, na = character(),
                       locale = default_locale(), progress = interactive())




> Log_File
                                           V1                                    V2                               V3       V4       V5                                                                         V6            V7
1                                  2019-09-20                          14:06:18.952                          [Error]   [main]        > CloudStorageExtension.swift[line:38]-downloadData(node:storageObj:value:):         Error
2                                  2019-09-20                          14:06:18.953                          [Error]   [main]        >                        AlertService.swift[line:310]-retrieveProfileName():        Unable
3                                       error                                     :                                {                                                                                                           
4                                        code                                     :                             404,                                                                                                           
5                                     message                                     : Not Found.  Could not get object        ,                                                                                                  
6                                      status                                     :                       GET_OBJECT                                                                                                           
7                                           }                                                                                                                                                                                  
8                                          }, bucket=integration-c5068.appspot.com,                   data=<7b0a2020 22657272 6f72223a                                                                   207b0a20      20202022
9                                    74206765                              74206f62                         6a656374 222c0a20 20202022                                                                   73746174      7573223a
10 ResponseErrorDomain=com.google.HTTPStatus,                ResponseErrorCode=404}                                                                                                                                            
11                                 2019-09-20                          14:06:18.953                          [Error]   [main]        >                        AlertService.swift[line:314]-retrieveProfileName(): AlertSettings
12                                      error                                     :                                {                                                                                                           
13                                       code                                     :                             404,                                                                                                           
14                                    message                                     : Not Found.  Could not get object        ,                                                                                                  
15                                     status                                     :                       GET_OBJECT                                                                                                           
16                                          }                                                                                                                                                                                  
17                                         }, bucket=integration-c5068.appspot.com,                   data=<7b0a2020 22657272 6f72223a                                                                   207b0a20      20202022
18                                   74206765                              74206f62                         6a656374 222c0a20 20202022                                                                   73746174      7573223a
19 ResponseErrorDomain=com.google.HTTPStatus,                ResponseErrorCode=404}                                                                                                                                            
20                                 2019-09-20                          14:06:18.957                          [Error]   [main]        > CloudStorageExtension.swift[line:38]-downloadData(node:storageObj:value:):         Error

我知道链接到文本文件的做法很烦人,所以这里有一些原始文本,希望能起作用:

I know linking to a text file is frowned upon, so here is some raw text, hopefully this works:

2019-09-20 14:06:18.952 [Error] [main] > CloudStorageExtension.swift[line:38]-downloadData(node:storageObj:value:): Error occurs when download filestorage data with description: Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist.
2019-09-20 14:06:18.953 [Error] [main] > AlertService.swift[line:310]-retrieveProfileName(): Unable to get AlertSettings Name: Error Domain=FIRStorageErrorDomain Code=-13010 "Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist." UserInfo={object=App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data, ResponseBody={
  "error": {
    "code": 404,
    "message": "Not Found.  Could not get object",
    "status": "GET_OBJECT"
  }
}, bucket=integration-c5068.appspot.com, data=<7b0a2020 22657272 6f72223a 207b0a20 20202022 636f6465 223a2034 30342c0a 20202020 226d6573 73616765 223a2022 4e6f7420 466f756e 642e2020 436f756c 64206e6f 74206765 74206f62 6a656374 222c0a20 20202022 73746174 7573223a 20224745 545f4f42 4a454354 220a2020 7d0a7d>, data_content_type=application/json; charset=UTF-8, NSLocalizedDescription=Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist., ResponseErrorDomain=com.google.HTTPStatus, ResponseErrorCode=404}
2019-09-20 14:06:18.953 [Error] [main] > AlertService.swift[line:314]-retrieveProfileName(): AlertSettings Name object missing: Error Domain=FIRStorageErrorDomain Code=-13010 "Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist." UserInfo={object=App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data, ResponseBody={
  "error": {
    "code": 404,
    "message": "Not Found.  Could not get object",
    "status": "GET_OBJECT"
  }
}, bucket=integration-c5068.appspot.com, data=<7b0a2020 22657272 6f72223a 207b0a20 20202022 636f6465 223a2034 30342c0a 20202020 226d6573 73616765 223a2022 4e6f7420 466f756e 642e2020 436f756c 64206e6f 74206765 74206f62 6a656374 222c0a20 20202022 73746174 7573223a 20224745 545f4f42 4a454354 220a2020 7d0a7d>, data_content_type=application/json; charset=UTF-8, NSLocalizedDescription=Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist., ResponseErrorDomain=com.google.HTTPStatus, ResponseErrorCode=404}
2019-09-20 14:06:18.957 [Error] [main] > CloudStorageExtension.swift[line:38]-downloadData(node:storageObj:value:): Error occurs when download filestorage data with description: Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist.

此处是与以下内容一起读取的dput:

Here is a dput as read in with:

Log_File <- read_delim("Example.txt", col_names = F, delim = " ")


Data <- structure(list(X1 = c("2019-09-20", "2019-09-20", "error\": {\n    \"code\": 404,\n    \"message\": \"Not Found.  Could not get object\",\n    \"status\": \"GET_OBJECT", 
"  }", "},", "2019-09-20", "error\": {\n    \"code\": 404,\n    \"message\": \"Not Found.  Could not get object\",\n    \"status\": \"GET_OBJECT", 
"  }", "},", "2019-09-20"), X2 = c("14:06:18.952", "14:06:18.953", 
NA, NA, "bucket=integration-c5068.appspot.com,", "14:06:18.953", 
NA, NA, "bucket=integration-c5068.appspot.com,", "14:06:18.957"
), X3 = c("[Error]", "[Error]", NA, NA, "data=<7b0a2020", "[Error]", 
NA, NA, "data=<7b0a2020", "[Error]"), X4 = c("[main]", "[main]", 
NA, NA, "22657272", "[main]", NA, NA, "22657272", "[main]"), 
    X5 = c(">", ">", NA, NA, "6f72223a", ">", NA, NA, "6f72223a", 
    ">"), X6 = c("CloudStorageExtension.swift[line:38]-downloadData(node:storageObj:value:):", 
    "AlertService.swift[line:310]-retrieveProfileName():", NA, 
    NA, "207b0a20", "AlertService.swift[line:314]-retrieveProfileName():", 
    NA, NA, "207b0a20", "CloudStorageExtension.swift[line:38]-downloadData(node:storageObj:value:):"
    ), X7 = c("Error", "Unable", NA, NA, "20202022", "AlertSettings", 
    NA, NA, "20202022", "Error"), X8 = c("occurs", "to", NA, 
    NA, "636f6465", "Name", NA, NA, "636f6465", "occurs"), X9 = c("when", 
    "get", NA, NA, "223a2034", "object", NA, NA, "223a2034", 
    "when"), X10 = c("download", "AlertSettings", NA, NA, "30342c0a", 
    "missing:", NA, NA, "30342c0a", "download"), X11 = c("filestorage", 
    "Name:", NA, NA, "20202020", "Error", NA, NA, "20202020", 
    "filestorage"), X12 = c("data", "Error", NA, NA, "226d6573", 
    "Domain=FIRStorageErrorDomain", NA, NA, "226d6573", "data"
    ), X13 = c("with", "Domain=FIRStorageErrorDomain", NA, NA, 
    "73616765", "Code=-13010", NA, NA, "73616765", "with"), X14 = c("description:", 
    "Code=-13010", NA, NA, "223a2022", "Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist.", 
    NA, NA, "223a2022", "description:"), X15 = c("Object", "Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist.", 
    NA, NA, "4e6f7420", "UserInfo={object=App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data,", 
    NA, NA, "4e6f7420", "Object"), X16 = c("App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data", 
    "UserInfo={object=App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data,", 
    NA, NA, "466f756e", "ResponseBody={", NA, NA, "466f756e", 
    "App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data"
    ), X17 = c("does", "ResponseBody={", NA, NA, "642e2020", 
    NA, NA, NA, "642e2020", "does"), X18 = c("not", NA, NA, NA, 
    "436f756c", NA, NA, NA, "436f756c", "not"), X19 = c("exist.", 
    NA, NA, NA, "64206e6f", NA, NA, NA, "64206e6f", "exist.")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), problems = structure(list(
    row = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 4L, 
    5L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 6L, 7L, 8L, 9L
    ), col = c("X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", 
    "X1", "X1", NA, NA, NA, NA, "X1", "X1", "X1", "X1", "X1", 
    "X1", "X1", "X1", "X1", "X1", NA, NA, NA, NA), expected = c("delimiter or quote", 
    "delimiter or quote", "delimiter or quote", "delimiter or quote", 
    "delimiter or quote", "delimiter or quote", "delimiter or quote", 
    "delimiter or quote", "delimiter or quote", "delimiter or quote", 
    "19 columns", "19 columns", "19 columns", "19 columns", "delimiter or quote", 
    "delimiter or quote", "delimiter or quote", "delimiter or quote", 
    "delimiter or quote", "delimiter or quote", "delimiter or quote", 
    "delimiter or quote", "delimiter or quote", "delimiter or quote", 
    "19 columns", "19 columns", "19 columns", "19 columns"), 
    actual = c(":", "c", ":", "m", ":", "N", ",", "s", ":", "G", 
    "17 columns", "1 columns", "1 columns", "40 columns", ":", 
    "c", ":", "m", ":", "N", ",", "s", ":", "G", "16 columns", 
    "1 columns", "1 columns", "40 columns"), file = c("'Example.txt'", 
    "'Example.txt'", "'Example.txt'", "'Example.txt'", "'Example.txt'", 
    "'Example.txt'", "'Example.txt'", "'Example.txt'", "'Example.txt'", 
    "'Example.txt'", "'Example.txt'", "'Example.txt'", "'Example.txt'", 
    "'Example.txt'", "'Example.txt'", "'Example.txt'", "'Example.txt'", 
    "'Example.txt'", "'Example.txt'", "'Example.txt'", "'Example.txt'", 
    "'Example.txt'", "'Example.txt'", "'Example.txt'", "'Example.txt'", 
    "'Example.txt'", "'Example.txt'", "'Example.txt'")), row.names = c(NA, 
-28L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list(
    cols = list(X1 = structure(list(), class = c("collector_character", 
    "collector")), X2 = structure(list(), class = c("collector_character", 
    "collector")), X3 = structure(list(), class = c("collector_character", 
    "collector")), X4 = structure(list(), class = c("collector_character", 
    "collector")), X5 = structure(list(), class = c("collector_character", 
    "collector")), X6 = structure(list(), class = c("collector_character", 
    "collector")), X7 = structure(list(), class = c("collector_character", 
    "collector")), X8 = structure(list(), class = c("collector_character", 
    "collector")), X9 = structure(list(), class = c("collector_character", 
    "collector")), X10 = structure(list(), class = c("collector_character", 
    "collector")), X11 = structure(list(), class = c("collector_character", 
    "collector")), X12 = structure(list(), class = c("collector_character", 
    "collector")), X13 = structure(list(), class = c("collector_character", 
    "collector")), X14 = structure(list(), class = c("collector_character", 
    "collector")), X15 = structure(list(), class = c("collector_character", 
    "collector")), X16 = structure(list(), class = c("collector_character", 
    "collector")), X17 = structure(list(), class = c("collector_character", 
    "collector")), X18 = structure(list(), class = c("collector_character", 
    "collector")), X19 = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 0), class = "col_spec"))

关于将没有日期的行附加到前一行/行的任何建议吗?

Any suggestions for appending rows without dates to the previous row/line?

推荐答案

我认为您不能使用定界符来做到这一点,但是应该进行简单的模式匹配(以 20 开头的行)足够:

I don't think you can use a delimiter to do that, but a simple pattern match (line starts with 20) should suffice:

示例数据,您应该在其中使用 readLines 命令读取所有文本,我在这里伪造它:

Sample data, where you should use the readLines command to read all text in, I'm faking it here:

# loglines <- readLines(filename)
loglines <- strsplit('2019-09-20 14:06:18.952 [Error] [main] > CloudStorageExtension.swift[line:38]-downloadData(node:storageObj:value:): Error occurs when download filestorage data with description: Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist.
2019-09-20 14:06:18.953 [Error] [main] > AlertService.swift[line:310]-retrieveProfileName(): Unable to get AlertSettings Name: Error Domain=FIRStorageErrorDomain Code=-13010 "Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist." UserInfo={object=App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data, ResponseBody={
  "error": {
    "code": 404,
    "message": "Not Found.  Could not get object",
    "status": "GET_OBJECT"
  }
}, bucket=integration-c5068.appspot.com, data=<7b0a2020 22657272 6f72223a 207b0a20 20202022 636f6465 223a2034 30342c0a 20202020 226d6573 73616765 223a2022 4e6f7420 466f756e 642e2020 436f756c 64206e6f 74206765 74206f62 6a656374 222c0a20 20202022 73746174 7573223a 20224745 545f4f42 4a454354 220a2020 7d0a7d>, data_content_type=application/json; charset=UTF-8, NSLocalizedDescription=Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist., ResponseErrorDomain=com.google.HTTPStatus, ResponseErrorCode=404}
2019-09-20 14:06:18.953 [Error] [main] > AlertService.swift[line:314]-retrieveProfileName(): AlertSettings Name object missing: Error Domain=FIRStorageErrorDomain Code=-13010 "Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist." UserInfo={object=App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data, ResponseBody={
  "error": {
    "code": 404,
    "message": "Not Found.  Could not get object",
    "status": "GET_OBJECT"
  }
}, bucket=integration-c5068.appspot.com, data=<7b0a2020 22657272 6f72223a 207b0a20 20202022 636f6465 223a2034 30342c0a 20202020 226d6573 73616765 223a2022 4e6f7420 466f756e 642e2020 436f756c 64206e6f 74206765 74206f62 6a656374 222c0a20 20202022 73746174 7573223a 20224745 545f4f42 4a454354 220a2020 7d0a7d>, data_content_type=application/json; charset=UTF-8, NSLocalizedDescription=Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist., ResponseErrorDomain=com.google.HTTPStatus, ResponseErrorCode=404}
2019-09-20 14:06:18.957 [Error] [main] > CloudStorageExtension.swift[line:38]-downloadData(node:storageObj:value:): Error occurs when download filestorage data with description: Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist.', "\n")[[1]]

在此示例中,我们将行与 grepl (返回 logical s的向量)和 cumsum 组合在一起:

Using this example, we group the lines together with grepl (returns a vector of logicals), and cumsum on that:

grepl("^20", loglines)
#  [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
# [13] FALSE FALSE FALSE  TRUE
cumsum(grepl("^20", loglines))
#  [1] 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4

因此,第一行是单独的,接下来的7行是一起的,依此类推.

So the first line is by itself, the next 7 are together, etc.

combined <- as.list(by(loglines, cumsum(grepl("^20", loglines)), paste, collapse = "\n"))
str(combined)
# List of 4
#  $ 1: chr "2019-09-20 14:06:18.952 [Error] [main] > CloudStorageExtension.swift[line:38]-downloadData(node:storageObj:valu"| __truncated__
#  $ 2: chr "2019-09-20 14:06:18.953 [Error] [main] > AlertService.swift[line:310]-retrieveProfileName(): Unable to get Aler"| __truncated__
#  $ 3: chr "2019-09-20 14:06:18.953 [Error] [main] > AlertService.swift[line:314]-retrieveProfileName(): AlertSettings Name"| __truncated__
#  $ 4: chr "2019-09-20 14:06:18.957 [Error] [main] > CloudStorageExtension.swift[line:38]-downloadData(node:storageObj:valu"| __truncated__

# perhaps for convenience:
combined <- unlist(as.list(combined), use.names = FALSE)

# one element:
combined[[2]]
# [1] "2019-09-20 14:06:18.953 [Error] [main] > AlertService.swift[line:310]-retrieveProfileName(): Unable to get AlertSettings Name: Error Domain=FIRStorageErrorDomain Code=-13010 \"Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist.\" UserInfo={object=App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data, ResponseBody={\n  \"error\": {\n    \"code\": 404,\n    \"message\": \"Not Found.  Could not get object\",\n    \"status\": \"GET_OBJECT\"\n  }\n}, bucket=integration-c5068.appspot.com, data=<7b0a2020 22657272 6f72223a 207b0a20 20202022 636f6465 223a2034 30342c0a 20202020 226d6573 73616765 223a2022 4e6f7420 466f756e 642e2020 436f756c 64206e6f 74206765 74206f62 6a656374 222c0a20 20202022 73746174 7573223a 20224745 545f4f42 4a454354 220a2020 7d0a7d>, data_content_type=application/json; charset=UTF-8, NSLocalizedDescription=Object App/Data/Users/U0bGtkevMkc8Z94KFIoYSKy87sS2/Modes/RealMode/Alert/Data does not exist., ResponseErrorDomain=com.google.HTTPStatus, ResponseErrorCode=404}"

(请注意每个字符串中嵌入的换行符.可以通过更改 collapse = 轻松更改.)

(Note the embedded newlines within each string. This can easily be changed by changing collapse=.)

此示例可能使用 read.fwf 解析,例如

This example might be parsed with read.fwf, such as

out <- read.fwf(textConnection(combined), widths=c(24, 8, 7, 999), stringsAsFactors=FALSE)
str(out)
# 'data.frame': 16 obs. of  4 variables:
#  $ V1: chr  "2019-09-20 14:06:18.952 " "2019-09-20 14:06:18.953 " "  \"error\": {" "    \"code\": 404," ...
#  $ V2: chr  "[Error] " "[Error] " NA NA ...
#  $ V3: chr  "[main] " "[main] " NA NA ...
#  $ V4: chr  "> CloudStorageExtension.swift[line:38]-downloadData(node:storageObj:value:): Error occurs when download filesto"| __truncated__ "> AlertService.swift[line:310]-retrieveProfileName(): Unable to get AlertSettings Name: Error Domain=FIRStorage"| __truncated__ NA NA ...

这可能会受益于删除周围的空白,例如使用

This might benefit from removing surrounding whitespace, such as with

out <- lapply(out, trimws)

这篇关于在读取不带分隔符的文本文件时将日期强制为新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆