将非常大的JSON文件转换为CSV [英] Converting a very large JSON file to CSV

查看:130
本文介绍了将非常大的JSON文件转换为CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约8GB的JSON文件.当我尝试使用此脚本转换文件时:

I have a JSON file that is about 8GB in size. When I try to convert the file using this script:

import csv
import json

infile = open("filename.json","r")
outfile = open("data.csv","w")

writer = csv.writer(outfile)

for row in json.loads(infile.read()):
    writer.write(row)

我收到此错误:

Traceback (most recent call last):
  File "E:/Thesis/DataDownload/PTDataDownload/demo.py", line 9, in <module>
    for row in json.loads(infile.read()):
MemoryError

我确定这与文件的大小有关.有没有办法确保文件将转换为CSV而不出现错误?

I'm sure this has to do with the size of the file. Is there a way to ensure the file will convert to a CSV without the error?

这是我的JSON代码的示例:

This is a sample of my JSON code:

     {"id":"tag:search.twitter.com,2005:905943958144118786","objectType":"activity","actor":{"objectType":"person","id":"id:twitter.com:899030045234167808","link":"http://www.twitter.com/NAJajsjs3","displayName":"NAJajsjs","postedTime":"2017-08-19T22:07:20.000Z","image":"https://pbs.twimg.com/profile_images/905943685493391360/2ZavxLrD_normal.jpg","summary":null,"links":[{"href":null,"rel":"me"}],"friendsCount":23,"followersCount":1,"listedCount":0,"statusesCount":283,"twitterTimeZone":null,"verified":false,"utcOffset":null,"preferredUsername":"NAJajsjs3","languages":["tr"],"favoritesCount":106},"verb":"post","postedTime":"2017-09-08T00:00:45.000Z","generator":{"displayName":"Twitter for iPhone","link":"http://twitter.com/download/iphone"},"provider":{"objectType":"service","displayName":"Twitter","link":"http://www.twitter.com"},"link":"http://twitter.com/NAJajsjs3/statuses/905943958144118786","body":"@thugIyfe Beyonce do better","object":{"objectType":"note","id":"object:search.twitter.com,2005:905943958144118786","summary":"@thugIyfe Beyonce do better","link":"http://twitter.com/NAJajsjs3/statuses/905943958144118786","postedTime":"2017-09-08T00:00:45.000Z"},"inReplyTo":{"link":"http://twitter.com/thugIyfe/statuses/905942854710775808"},"favoritesCount":0,"twitter_entities":{"hashtags":[],"user_mentions":[{"screen_name":"thugIyfe","name":"dari.","id":40542633,"id_str":"40542633","indices":[0,9]}],"symbols":[],"urls":[]},"twitter_filter_level":"low","twitter_lang":"en","display_text_range":[10,27],"retweetCount":0,"gnip":{"matching_rules":[{"tag":null,"id":6134817834619900217,"id_str":"6134817834619900217"}]}}

(对不起,格式很难理解)

(sorry for the ugly formatting)

另一个选择可能是,我合并了约8000个较小的json文件来制作此文件.他们每个人都在自己的文件夹中,文件夹中只有一个json.将它们分别转换成一个csv会更容易吗?

An alternative may be that I have about 8000 smaller json files that I combined to make this file. They are each within their own folder with just the single json in the folder. Would it be easier to convert each of these individually and then combine them into one csv?

之所以这样问,是因为我有非常基本的python知识,而对类似问题的所有答案都比我理解的要复杂得多.请帮助这个新的python用户以csv格式读取json!

The reason I am asking this is because I have very basic python knowledge and all the answers to similar questions that I have found are way more complicated than I can understand. Please help this new python user to read this json as a csv!

推荐答案

将它们分别转换成一个csv会更容易吗?

Would it be easier to convert each of these individually and then combine them into one csv?

是的,肯定会

例如,这会将每个JSON对象/数组(无论从文件中加载了什么)都放置在其自己的单个CSV行中.

For example, this will put each JSON object/array (whatever is loaded from the file) onto its own line of a single CSV.

import json, csv
from glob import glob

with open('out.csv', 'w') as f:
    for fname in glob("*.json"):  # Reads all json from the current directory
        with open(fname) as j:
            f.write(str(json.load(j)))
            f.write('\n')

使用glob模式**/*.json在嵌套文件夹中查找所有json文件

Use glob pattern **/*.json to find all json files in nested folders

由于您没有数组,因此不太清楚for row in ...对数据的作用.除非您希望每个JSON键都成为CSV列?

Not really clear what for row in ... was doing for your data since you don't have an array. Unless you wanted each JSON key to be a CSV column?

这篇关于将非常大的JSON文件转换为CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆