jq流-过滤嵌套列表并保留全局结构 [英] jq streaming - filter nested list and retain global structure

查看:129
本文介绍了jq流-过滤嵌套列表并保留全局结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个大的json文件中,我想从嵌套列表中删除一些元素,但要保留文档的整体结构.

In a large json file, I want to remove some elements from a nested list, but keep the overall structure of the document.

我的示例是为此输入的(但是实际的大小足以要求流式传输).

My example input it this (but the real one is large enough to demand streaming).

{
  "keep_untouched": {
    "keep_this": [
      "this",
      "list"
    ]
  },
  "filter_this":
  [
    {"keep" : "true"},
    {
      "keep": "true",
      "extra": "keeper"
    } ,
    {
      "keep": "false",
      "extra": "non-keeper"
    }
  ]
}

所需的输出只是删除了'filter_this'块的一个元素:

The required output just has one element of the 'filter_this' block removed:

{
  "keep_untouched": {
    "keep_this": [
      "this",
      "list"
    ]
  },
  "filter_this":
  [
    {"keep" : "true"},
    {
      "keep": "true",
      "extra": "keeper"
    } ,
  ]
}

处理这种情况的标准方法似乎是使用"truncate_stream"重构流对象,然后再以常规jq方式过滤对象.具体来说,该命令:

The standard way to handle such cases appears to be using 'truncate_stream' to reconstitute streamed objects, before filtering those in the usual jq way. Specifically, the command:

jq -nc --stream 'fromstream(1|truncate_stream(inputs))' 

可以访问对象流:

{"keep_this":["this","list"]}
[{"keep":"true"},{"keep":"true","extra":"keeper"}, 
 {"keep":"false","extra":"non-keeper"}]

这时很容易过滤所需的对象.但是,这会从其父对象的上下文中剥离结果,这不是我想要的.

at which point it is easy to filter for the required objects. However, this strips the results from the context of their parent object, which is not what I want.

查看流结构:

[["keep_untouched","keep_this",0],"this"]
[["keep_untouched","keep_this",1],"list"]
[["keep_untouched","keep_this",1]]
[["keep_untouched","keep_this"]]
[["filter_this",0,"keep"],"true"]
[["filter_this",0,"keep"]]
[["filter_this",1,"keep"],"true"]
[["filter_this",1,"extra"],"keeper"]
[["filter_this",1,"extra"]]
[["filter_this",2,"keep"],"false"]
[["filter_this",2,"extra"],"non-keeper"]
[["filter_this",2,"extra"]]
[["filter_this",2]]
[["filter_this"]]

似乎我需要选择所有"filter_this"行,仅截断这些行(使用"truncate_stream"),将这些行重建为对象(使用"from_stream"),过滤它们,然后将这些对象重新放入流中数据格式(使用"tostream")来加入保持未触及"行的流,这些行仍处于流格式.到那时,有可能重新构建整个json.如果那是正确的方法-在我看来似乎过于集中-我该怎么做?或者,还有更好的方法?

it seems I need to select all the 'filter_this' rows, truncate those rows only (using 'truncate_stream'), rebuild these rows as objects (using 'from_stream'), filter them, and turn the objects back into the stream data format (using 'tostream') to join the stream of 'keep untouched' rows, which are still in the streaming format. At that point it would be possible to re-build the whole json. If that is the right approach - which seems overly converluted to me - how do I do that? Or is there a better way?

推荐答案

如果您的输入文件包含一个非常大的JSON实体,而对于常规jq解析器而言,它太大了,那么您的环境就有很大的可能性您将没有足够的内存来重构JSON文档.

If your input file consists of a single very large JSON entity that is too big for the regular jq parser to handle in your environment, then there is the distinct possibility that you won't have enough memory to reconstitute the JSON document.

有了这样的警告,以下内容可能值得一试.关键见解是可以使用reduce完成重建.

With that caveat, the following may be worth a try. The key insight is that reconstruction can be accomplished using reduce.

为了清楚起见,以下使用了一些临时文件:

The following uses a bunch of temporary files for the sake of clarity:

TMP=/tmp/$$

jq -c --stream 'select(length==2)' input.json > $TMP.streamed

jq -c 'select(.[0][0] != "filter_this")' $TMP.streamed > $TMP.1

jq -c 'select(.[0][0] == "filter_this")' $TMP.streamed |
  jq -nc 'reduce inputs as [$p,$x] (null; setpath($p;$x))
          | .filter_this |= map(select(.keep=="true"))
          | tostream
          | select(length==2)' > $TMP.2

# Reconstruction
jq -n 'reduce inputs as [$p,$x] (null; setpath($p;$x))' $TMP.1 $TMP.2

输出

{
  "keep_untouched": {
    "keep_this": [
      "this",
      "list"
    ]
  },
  "filter_this": [
    {
      "keep": "true"
    },
    {
      "keep": "true",
      "extra": "keeper"
    }
  ]
}

这篇关于jq流-过滤嵌套列表并保留全局结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆