将无序列表缩进嵌套的list() [英] indented unordered list to nested list()

查看:94
本文介绍了将无序列表缩进嵌套的list()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来如下的日志文件:

I've got a log file that looks as follows:

Data:
 +datadir=/data/2017-11-22
 +Nusers=5292
Parameters:
 +outdir=/data/2017-11-22/out
 +K=20
 +IC=179
 +ICgroups=3
   -group 1: 1-1
    ICeffects: 1-5
   -group 2: 2-173
    ICeffects: 6-10
   -group 3: 175-179
    ICeffects: 11-15

我想使用R将这个日志文件解析成一个嵌套列表,以便结果看起来像这样:

I would like to parse this logfile into a nested list using R so that the result will look like this:

result <- list(Data = list(datadir = '/data/2017-11-22',
                           Nusers = 5292),
               Parameters = list(outdir = '/data/2017-11-22/out',
                                 K = 20,
                                 IC = 179,
                                 ICgroups = list(list('group 1' = '1-1',
                                                      ICeffects = '1-5'),
                                                      list('group 2' = '2-173',
                                                      ICeffects = '6-10'),
                                                      list('group 1' = '175-179',
                                                      ICeffects = '11-15'))))

有没有一种非常痛苦的方式来做到这一点?

Is there a not-extremely-painful way of doing this?

推荐答案

免责声明:这很混乱.无法保证这无需更改即可适用于较大/不同的文件.您将需要做一些仔细的检查.

Disclaimer: This is messy. There is no guarantee that this will work for larger/different files without some tweaking. You will need to do some careful checking.

这里的关键思想是重新格式化原始数据,使其与YAML格式一致,然后使用yaml::yaml.load解析数据以生成嵌套列表.

The key idea here is to reformat the raw data, to make it consistent with the YAML format, and then use yaml::yaml.load to parse the data to produce a nested list.

顺便说一句,这是一个很好的例子,说明了为什么人们真的应该对日志输出/配置文件(例如JSON,YAML等)使用通用的标记语言...

By the way, this is an excellent example on why one really should use a common markup language for log-output/config files (like JSON, YAML, etc.)...

我假设您使用readLines读取日志文件以生成字符串ss的向量.

I assume you read in the log file using readLines to produce the vector of strings ss.

# Sample data
ss <- c(
    "Data:",
    " +datadir=/data/2017-11-22",
    " +Nusers=5292",
    "Parameters:",
    " +outdir=/data/2017-11-22/out",
    " +K=20",
    " +IC=179",
    " +ICgroups=3",
    "   -group 1: 1-1",
    "    ICeffects: 1-5",
    "   -group 2: 2-173",
    "    ICeffects: 6-10",
    "   -group 3: 175-179",
    "    ICeffects: 11-15")

然后我们重新格式化数据以遵循YAML格式.

We then reformat the data to adhere to the YAML format.

# Reformat to adhere to YAML formatting
ss <- gsub("\\+", "- ", ss);                   # Replace "+" with "- "
ss <- gsub("ICgroups=\\d+","ICgroups:", ss);   # Replace "ICgroups=3" with "ICgroups:"
ss <- gsub("=", " : ", ss);                    # Replace "=" with ": "
ss <- gsub("-group", "- group", ss);           # Replace "-group" with "- group"
ss <- gsub("ICeffects", " ICeffects", ss);     # Replace "ICeffects" with " ICeffects"

请注意-与您的预期输出一致-ICgroups的值3未被使用,我们需要将ICgroups=3替换为ICgroups:以启动嵌套子列表.这是让我最先离开的部分...

Note that – consistent with your expected output – the value 3 from ICgroups doesn't get used, and we need to replace ICgroups=3 with ICgroups: to initiate a nested sub-list. This was the part that threw me off first...

加载和解析YAML字符串然后生成一个嵌套列表.

Loading & parsing the YAML string then produces a nested list.

require(yaml);
lst <- yaml.load(paste(ss, collapse = "\n"));
lst;

#$Data
#$Data[[1]]
#$Data[[1]]$datadir
#[1] "/data/2017-11-22"
#
#
#$Data[[2]]
#$Data[[2]]$Nusers
#[1] 5292
#
#
#
#$Parameters
#$Parameters[[1]]
#$Parameters[[1]]$outdir
#[1] "/data/2017-11-22/out"
#
#
#$Parameters[[2]]
#$Parameters[[2]]$K
#[1] 20
#
#
#$Parameters[[3]]
#$Parameters[[3]]$IC
#[1] 179
#
#
#$Parameters[[4]]
#$Parameters[[4]]$ICgroups
#$Parameters[[4]]$ICgroups[[1]]
#$Parameters[[4]]$ICgroups[[1]]$`group 1`
#[1] "1-1"
#
#$Parameters[[4]]$ICgroups[[1]]$ICeffects
#[1] "1-5"
#
#
#$Parameters[[4]]$ICgroups[[2]]
#$Parameters[[4]]$ICgroups[[2]]$`group 2`
#[1] "2-173"
#
#$Parameters[[4]]$ICgroups[[2]]$ICeffects
#[1] "6-10"
#
#
#$Parameters[[4]]$ICgroups[[3]]
#$Parameters[[4]]$ICgroups[[3]]$`group 3`
#[1] "175-179"
#
#$Parameters[[4]]$ICgroups[[3]]$ICeffects
#[1] "11-15"

PS.您将需要在较大的文件上对此进行测试,并在必要时对替换进行更改.

PS. You will need to test this on larger files, and make changes to the substitution if necessary.

这篇关于将无序列表缩进嵌套的list()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆