(已关闭)从JSON到CSV的大数据集转换 [英] (Closed) JSON to CSV conversion for LARGE DATASET
问题描述
我有一个.txt文件,其中包含超过一百万个JSON实体,其中包含从python程序生成的各种密钥.这只是一个例子.
I have a .txt file with over a million JSON entities in it with varying keys generated from a python program. This is just an example.
{
"category": "Athlete",
"website": "example.com",
"talking_about_count": 560,
"description": "xxx",
"id": "123"
}
{
"category": "Community",
"talking_about_count": 0,
"name": "The Second Civil War",
"likes": 26,
"id": "234",
"is_published": true
}
尽管每个JSON具有不同的属性,但它们确实具有共同的属性. 生成的.csv文件应具有列类别,网站,talking_about_count,描述,id,名称,喜欢,按这种方式发布
Even though each JSON has different attributes, they do have common attributes. The resulting .csv file would have the columns category, website, talking_about_count,description,id,name,likes,is_published like this
"category","website","talking_about_count","name","likes","description","id","is_published"
"Athlete","example.com","560","","","xxx","123",""
"Community","","0","The Second Civil War","26","","234","True"
https://json-csv.com/可以很好地做到这一点,但无法使用超过1000个实体.
https://json-csv.com/ does this beautifully but is not able to handle datasets with more than 1000 entities.
我想从包含100万个JSON实体的.txt文件中创建CSV,我想知道是否有更好的方法来解决这个问题.
I would like to create a CSV from this .txt file containing a million JSON entities and I was wondering if there is a better way to go about this.
推荐答案
以下是使用 jq
如果文件filter.jq
包含
(reduce (.[]|keys_unsorted[]) as $k ({};.[$k]="")) as $o # object with all keys
| ($o | keys_unsorted), (.[] | $o * . | [.[]]) # generate header and data
| @csv # convert to csv
和data.json
包含示例数据,然后是命令
and data.json
contains the sample data then the command
jq -M -s -r -f filter.jq data.json
将产生输出
"category","website","talking_about_count","description","id","name","likes","is_published"
"Athlete","example.com",560,"xxx","123","","",""
"Community","",0,"","234","The Second Civil War",26,true
这篇关于(已关闭)从JSON到CSV的大数据集转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!