使用Unix命令Split分割大JSON数据 [英] Splitting large JSON data using Unix command Split
问题描述
使用 Unix Split命令
进行拆分大数据的问题: split -l 1000 file.json myfile
。想要将此文件拆分为每个1000条记录的多个文件。但是我将输出作为单个文件获取-没有更改。
Issue with Unix Split command
for splitting large data: split -l 1000 file.json myfile
. Want to split this file into multiple files of 1000 records each. But Im getting the output as single file - no change.
PS 已创建文件,将Pandas Dataframe转换为JSON。
P.S. File is created converting Pandas Dataframe to JSON.
编辑:结果表明,我的JSON的格式仅包含一行。 wc -l file.json
返回 0
It turn outs that my JSON is formatted in a way that it contains only one row. wc -l file.json
is returning 0
这里是示例: file.json
[
{"id":683156,"overall_rating":5.0,"hotel_id":220216,"hotel_name":"Beacon Hill Hotel","title":"\u201cgreat hotel, great location\u201d","text":"The rooms here are not palatial","author_id":"C0F"},
{"id":692745,"overall_rating":5.0,"hotel_id":113317,"hotel_name":"Casablanca Hotel Times Square","title":"\u201cabsolutely delightful\u201d","text":"I travelled from Spain...","author_id":"8C1"}
]
推荐答案
建议使用 jq
拆分JSON数组(请参见手册)
I'd recommend spliting the JSON array with jq
(see manual).
cat file.json | jq length # get length of an array
cat file.json | jq -c '.[0:999]' # first 1000 items
cat file.json | jq -c '.[1000:1999]' # second 1000 items
...
通知 -c
以获得紧凑的结果(打印效果不佳)。
Notice -c
for compact result (not pretty printed).
对于自动化,您可以编写一个简单的bash脚本,将给定的文件拆分为多个块数组长度( jq长度
)。
For automation, you can code a simple bash script to split your file into chunks given the array length (jq length
).
这篇关于使用Unix命令Split分割大JSON数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!