使用jq,如何将一个非常大的JSON文件拆分为多个文件,每个文件都有特定数量的对象? [英] Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?

查看:373
本文介绍了使用jq,如何将一个非常大的JSON文件拆分为多个文件,每个文件都有特定数量的对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的JSON文件,我猜测有400万个对象.每个顶层都有嵌套的几个层次.我想将其拆分为每个10000个顶级对象的多个文件(保留每个文件的内部结构). jq应该能够做到这一点吗?我不确定如何.

I have a large JSON file with I'm guessing 4 million objects. Each top level has a few levels nested inside. I want to split that into multiple files of 10000 top level objects each (retaining the structure inside each). jq should be able to do that right? I'm not sure how.

像这样的数据:

[{
  "id": 1,
  "user": {
    "name": "Nichols Cockle",
    "email": "ncockle0@tmall.com",
    "address": {
      "city": "Turt",
      "state": "Thị Trấn Yên Phú"
    }
  },
  "product": {
    "name": "Lychee - Canned",
    "code": "36987-1526"
  }
}, {
  "id": 2,
  "user": {
    "name": "Isacco Scrancher",
    "email": "iscrancher1@aol.com",
    "address": {
      "city": "Likwatang Timur",
      "state": "Biharamulo"
    }
  },
  "product": {
    "name": "Beer - Original Organic Lager",
    "code": "47993-200"
  }
}, {
  "id": 3,
  "user": {
    "name": "Elga Sikora",
    "email": "esikora2@statcounter.com",
    "address": {
      "city": "Wenheng",
      "state": "Piedra del Águila"
    }
  },
  "product": {
    "name": "Parsley - Dried",
    "code": "36987-1632"
  }
}, {
  "id": 4,
  "user": {
    "name": "Andria Keatch",
    "email": "akeatch3@salon.com",
    "address": {
      "city": "Arras",
      "state": "Iracemápolis"
    }
  },
  "product": {
    "name": "Wine - Segura Viudas Aria Brut",
    "code": "51079-385"
  }
}, {
  "id": 5,
  "user": {
    "name": "Dara Sprowle",
    "email": "dsprowle4@slate.com",
    "address": {
      "city": "Huatai",
      "state": "Kaduna"
    }
  },
  "product": {
    "name": "Pork - Hock And Feet Attached",
    "code": "0054-8648"
  }
}]

这是一个完整的对象:

{
  "id": 1,
  "user": {
    "name": "Nichols Cockle",
    "email": "ncockle0@tmall.com",
    "address": {
      "city": "Turt",
      "state": "Thị Trấn Yên Phú"
    }
  },
  "product": {
    "name": "Lychee - Canned",
    "code": "36987-1526"
  }
}

每个文件都是指定数量的对象.

And each file would be a specified number of objects like that.

推荐答案

使用jq解决问题的关键是-c命令行选项,该选项以JSON-Lines格式(即,在当前情况下,每行一个对象)产生输出.然后,您可以使用awksplit之类的工具将这些行分配到多个文件中.

The key to using jq to solve the problem is the -c command-line option, which produces output in JSON-Lines format (i.e., in the present case, one object per line). You can then use a tool such as awk or split to distribute those lines amongst several files.

如果文件不是太大,那么最简单的方法是使用以下命令启动管道:

If the file is not too big, then the simplest would be to start the pipeline with:

jq -c '.[]' INPUTFILE

如果文件太大而无法容纳在内存中,则可以使用jq的流解析器,如下所示:

If the file is too big to fit comfortably in memory, then you could use jq's streaming parser, like so:

jq -cn --stream 'fromstream(1|truncate_stream(inputs))'

有关流解析器的进一步讨论,请参见jq FAQ中的相关部分: https://github.com/stedolan /jq/wiki/FAQ#streaming-json-parser

For further discussion about the streaming parser, see e.g. the relevant section in the jq FAQ: https://github.com/stedolan/jq/wiki/FAQ#streaming-json-parser

有关对第一步中产生的输出进行分区的不同方法,请参见例如

For different approaches to partitioning the output produced in the first step, see for example How to split a large text file into smaller files with equal number of lines?

如果要求每个输出文件都是对象数组,那么我可能会使用awk一步执行分区和重构,但是还有许多其他合理的方法.

If it is required that each of the output files be an array of objects, then I'd probably use awk to perform both the partitioning and the re-constitution in one step, but there are many other reasonable approaches.

作为参考,如果原始文件由JSON对象的流或序列组成,则适当的调用将是:

For reference, if the original file consists of a stream or sequence of JSON objects, then the appropriate invocation would be:

jq -n -c inputs INPUTFILE

以这种方式使用inputs可以有效地任意处理许多对象.

Using inputs in this manner allows arbitrarily many objects to be processed efficiently.

这篇关于使用jq,如何将一个非常大的JSON文件拆分为多个文件,每个文件都有特定数量的对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆