从JSON到JSONL的Python转换 [英] Python conversion from JSON to JSONL

查看:205
本文介绍了从JSON到JSONL的Python转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望将标准JSON对象处理为一个对象,其中每行必须包含一个单独的,自包含的有效JSON对象.请参见 JSON行

I wish to manipulate a standard JSON object to an object where each line must contain a separate, self-contained valid JSON object. See JSON Lines

JSON_file =

[{u'index': 1,
  u'no': 'A',
  u'met': u'1043205'},
 {u'index': 2,
  u'no': 'B',
  u'met': u'000031043206'},
 {u'index': 3,
  u'no': 'C',
  u'met': u'0031043207'}]

To JSONL:

{u'index': 1, u'no': 'A', u'met': u'1043205'}
{u'index': 2, u'no': 'B', u'met': u'031043206'}
{u'index': 3, u'no': 'C', u'met': u'0031043207'}

我当前的解决方案是将JSON文件读取为文本文件,并从开头删除[,从结尾删除].因此,在每行上创建一个有效的JSON对象,而不是包含行的嵌套对象.

My current solution is to read the JSON file as a text file and remove the [ from the beginning and the ] from the end. Thus, creating a valid JSON object on each line, rather than a nested object containing lines.

我想知道是否有更优雅的解决方案?我怀疑使用文件中的字符串操作可能会出错.

I wonder if there is a more elegant solution? I suspect something could go wrong using string manipulation on the file.

动机是将json文件读入Spark上的RDD中.请参阅相关问题-使用Apache Spark读取JSON-`corrupt_record`

The motivation is to read json files into RDD on Spark. See related question - Reading JSON with Apache Spark - `corrupt_record`

推荐答案

您的输入似乎是 Python对象的序列;肯定不是有效的JSON文档.

Your input appears to be a sequence of Python objects; it certainly is not valid a JSON document.

如果您有Python字典列表,那么您要做的就是将每个条目分别转储到文件中,然后换行:

If you have a list of Python dictionaries, then all you have to do is dump each entry into a file separately, followed by a newline:

import json

with open('output.jsonl', 'w') as outfile:
    for entry in JSON_file:
        json.dump(entry, outfile)
        outfile.write('\n')

json模块的默认配置是输出不嵌入换行符的JSON.

The default configuration for the json module is to output JSON without newlines embedded.

假设您的ABC名称确实是字符串,则将产生:

Assuming your A, B and C names are really strings, that would produce:

{"index": 1, "met": "1043205", "no": "A"}
{"index": 2, "met": "000031043206", "no": "B"}
{"index": 3, "met": "0031043207", "no": "C"}

如果您从包含条目列表的JSON文档开始,只需先使用json.load()/json.loads()解析该文档即可.

If you started with a JSON document containing a list of entries, just parse that document first with json.load()/json.loads().

这篇关于从JSON到JSONL的Python转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆