使用Java将大JSON文件拆分为较小的JSON文件 [英] Split a large JSON file into smaller JSON files using Java

查看:121
本文介绍了使用Java将大JSON文件拆分为较小的JSON文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个JSON格式的大型数据集,为了易于使用,我想将其拆分为多个json文件,同时仍保持其结构.例如: {"{"用户":[{"userId":1"firstName":"Krish","lastName":李","phoneNumber":"123456",电子邮件地址":"krish.lee@learningcontainer.com"},{"userId":2"firstName":机架","lastName":"jacson","phoneNumber":"123456","emailAddress":"racks.jacson@learningcontainer.com"},{"userId":3,"firstName":拒绝","lastName":烤","phoneNumber":"33333333","emailAddress":"denial.roast@learningcontainer.com"},{"userId":4"firstName":分隔","lastName":"neo","phoneNumber":"222222222","emailAddress":"devid.neo@learningcontainer.com"},{"userId":5"firstName":"jone","lastName":"mac","phoneNumber":"111111111","emailAddress":"jone.mac@learningcontainer.com"}]}我应该能够以每个用户ID转到不同文件的方式对其进行拆分.到目前为止,我已经尝试将它们放置到地图上并尝试拆分地图,然后将其转换为数组并在运气不佳的情况下拆分数组.这些文件包含用户标识,但不再采用json格式关于如何用Java实现这一点的任何建议?

I have a large dataset in JSON format, for ease of use, I want to split it into multiple json files while still maintaining the structure. For ex:{ "{"users": [ { "userId": 1, "firstName": "Krish", "lastName": "Lee", "phoneNumber": "123456", "emailAddress": "krish.lee@learningcontainer.com" }, { "userId": 2, "firstName": "racks", "lastName": "jacson", "phoneNumber": "123456", "emailAddress": "racks.jacson@learningcontainer.com" }, { "userId": 3, "firstName": "denial", "lastName": "roast", "phoneNumber": "33333333", "emailAddress": "denial.roast@learningcontainer.com" }, { "userId": 4, "firstName": "devid", "lastName": "neo", "phoneNumber": "222222222", "emailAddress": "devid.neo@learningcontainer.com" }, { "userId": 5, "firstName": "jone", "lastName": "mac", "phoneNumber": "111111111", "emailAddress": "jone.mac@learningcontainer.com" } ] } I should be able to split it in such a way that each userid goes to a different file. So far, i have tried putting them to a map and try to split the map, and converting it into array and split the array with not much luck. The files contain the userid but it is not in json format anymore Any suggestions on how this can be achieved in Java?

预期结果: {用户":[{"userId":1"firstName":"Krish","lastName":李","phoneNumber":"123456",电子邮件地址":"krish.lee@learningcontainer.com"}]}

推荐答案

要处理大型文件,首选使用面向流/事件的解析.格森和杰克逊都支持这种方式.只是一个带有微小JSON解析器的插图 https://github.com/anatolygudkov/green-jelly:

To process large files prefer to use stream/event oriented parsing. Both Gson and Jackson support that way. Just an illustration with a tiny JSON parser https://github.com/anatolygudkov/green-jelly:

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.io.Writer;

public class SplitMyJson {
    private static final String jsonToSplit = "{\"users\": [\n" +
            "    {\n" +
            "      \"userId\": 1,\n" +
            "      \"firstName\": \"Krish\",\n" +
            "      \"lastName\": \"Lee\",\n" +
            "      \"phoneNumber\": \"123456\",\n" +
            "      \"emailAddress\": \"krish.lee@learningcontainer.com\"\n" +
            "    },\n" +
            "    {\n" +
            "      \"userId\": 2,\n" +
            "      \"firstName\": \"racks\",\n" +
            "      \"lastName\": \"jacson\",\n" +
            "      \"phoneNumber\": \"123456\",\n" +
            "      \"emailAddress\": \"racks.jacson@learningcontainer.com\"\n" +
            "    },\n" +
            "    {\n" +
            "      \"userId\": 3,\n" +
            "      \"firstName\": \"denial\",\n" +
            "      \"lastName\": \"roast\",\n" +
            "      \"phoneNumber\": \"33333333\",\n" +
            "      \"emailAddress\": \"denial.roast@learningcontainer.com\"\n" +
            "    },\n" +
            "    {\n" +
            "      \"userId\": 4,\n" +
            "      \"firstName\": \"devid\",\n" +
            "      \"lastName\": \"neo\",\n" +
            "      \"phoneNumber\": \"222222222\",\n" +
            "      \"emailAddress\": \"devid.neo@learningcontainer.com\"\n" +
            "    },\n" +
            "    {\n" +
            "      \"userId\": 5,\n" +
            "      \"firstName\": \"jone\",\n" +
            "      \"lastName\": \"mac\",\n" +
            "      \"phoneNumber\": \"111111111\",\n" +
            "      \"emailAddress\": \"jone.mac@learningcontainer.com\"\n" +
            "    }\n" +
            "  ]\n" +
            "}";

    public static void main(String[] args) {
        final JsonParser parser = new JsonParser();
        parser.setListener(new Splitter(new File("/home/gudkov/mytest")));
        parser.parse(jsonToSplit); // if you read a file, call parse() several times part by part in a loop until EOF
        parser.eoj(); // and then call .eoj()
    }

    static class Splitter extends JsonParserListenerAdaptor {
        private final JsonGenerator jsonGenerator = new JsonGenerator();
        private final AppendableWriter<Writer> appendableWriter = new AppendableWriter<>();

        private final File outputFolder;
        private int objectDepth;
        private int userIndex;

        Splitter(final File outputFolder) {
            this.outputFolder = outputFolder;
            if (!outputFolder.exists()) {
                outputFolder.mkdirs();
            }

            jsonGenerator.setOutput(appendableWriter);
        }

        private boolean userJustStarted() {
            return objectDepth == 2;
        }

        private boolean userJustEnded() {
            return objectDepth == 1;
        }

        private boolean notInUser() {
            return objectDepth < 2;
        }

        @Override
        public boolean onObjectStarted() {
            objectDepth++;

            if (notInUser()) return true;

            if (userJustStarted()) {
                try {
                    appendableWriter.set(new FileWriter(new File(outputFolder, "user-" + userIndex + ".json")));
                } catch (IOException e) {
                    throw new UncheckedIOException(e);
                }
                userIndex++;
            }
            jsonGenerator.startObject();
            return true;
        }

        @Override
        public boolean onObjectEnded() {
            if (notInUser()) {
                objectDepth--;
                return true;
            }

            objectDepth--;

            jsonGenerator.endObject();

            if (userJustEnded()) { // user object ended
                try {
                    jsonGenerator.eoj();
                    appendableWriter.output().close();
                } catch (IOException e) {
                    throw new UncheckedIOException(e);
                }
            }
            return true;
        }

        @Override
        public boolean onArrayStarted() {
            if (notInUser()) return true;
            jsonGenerator.startArray();
            return true;
        }

        @Override
        public boolean onArrayEnded() {
            if (notInUser()) return true;
            jsonGenerator.endArray();
            return true;
        }

        @Override
        public boolean onObjectMember(final CharSequence name) {
            if (notInUser()) return true;
            jsonGenerator.objectMember(name);
            return true;
        }

        @Override
        public boolean onStringValue(final CharSequence data) {
            if (notInUser()) return true;
            jsonGenerator.stringValue(data, true);
            return true;
        }

        @Override
        public boolean onNumberValue(final JsonNumber number) {
            if (notInUser()) return true;
            jsonGenerator.numberValue(number);
            return true;
        }

        @Override
        public boolean onTrueValue() {
            if (notInUser()) return true;
            jsonGenerator.trueValue();
            return true;
        }

        @Override
        public boolean onFalseValue() {
            if (notInUser()) return true;
            jsonGenerator.falseValue();
            return true;
        }

        @Override
        public boolean onNullValue() {
            if (notInUser()) return true;
            jsonGenerator.nullValue();
            return true;
        }
    }
}

通过这种方式,您可以轻松地对大型文件执行过滤,聚合等操作,从而在常规Java中具有最高的性能.

In this way you can easily implement filtering, aggregating etc. for really large files with the highest performance possible in regular Java.

这篇关于使用Java将大JSON文件拆分为较小的JSON文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆