如何解析巨大的JSON文件而不将其加载到内存中 [英] how to parse a huge JSON file without loading it in memory

查看:152
本文介绍了如何解析巨大的JSON文件而不将其加载到内存中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的JSON文件(2.5MB),其中包含约80000行.

它看起来像这样:

{
  "a": 123,
  "b": 0.26,
  "c": [HUGE irrelevant object],
  "d": 32
}

我只希望存储键abd的整数值,并忽略其余JSON(即忽略c值中的内容).

我无法修改原始的JSON,因为它是由第三方服务创建的,该服务是从其服务器下载的.

如何在不将整个文件加载到内存的情况下执行此操作?

我尝试使用 gson 库,并创建了这样的bean:

public class MyJsonBean {
  @SerializedName("a")
  @Expose
  public Integer a;

  @SerializedName("b")
  @Expose
  public Double b;

  @SerializedName("d")
  @Expose
  public Integer d;
}

但是即使如此,为了使用Gson反序列化,我还需要先下载+读取内存中的整个文件,然后将其作为字符串传递给Gson?

File myFile = new File(<FILENAME>);
myFile.createNewFile();

URL url = new URL(<URL>);
OutputStream out = new BufferedOutputStream(new FileOutputStream(myFile));
URLConnection conn = url.openConnection();

HttpURLConnection httpConn = (HttpURLConnection) conn;

InputStream in = conn.getInputStream();
byte[] buffer = new byte[1024];

int numRead;
while ((numRead = in.read(buffer)) != -1) {
  out.write(buffer, 0, numRead);
}

FileInputStream fis = new FileInputStream(myFile);
byte[] data = new byte[(int) myFile.length()];
fis.read(data);
String str = new String(data, "UTF-8");

Gson gson = new Gson();
MyJsonBean response = gson.fromJson(str, MyJsonBean.class);

System.out.println("a: " + response.a + "" + response.b + "" + response.d);

有什么方法可以避免加载整个文件,而只是获取我需要的相关值?

解决方案

您绝对应该检查其他方法和库.如果您真的很在意性能检查,请执行以下操作:Gson Jackson JsonPath 库可以做到这一点并选择最快的库.绝对,您必须将整个JSON文件加载到本地磁盘(可能是TMP文件夹)中,然后再对其进行解析.

简单的JsonPath解决方案如下所示:

import com.jayway.jsonpath.DocumentContext;
import com.jayway.jsonpath.JsonPath;

import java.io.File;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        DocumentContext documentContext = JsonPath.parse(jsonFile);
        System.out.println("" + documentContext.read("$.a"));
        System.out.println("" + documentContext.read("$.b"));
        System.out.println("" + documentContext.read("$.d"));
    }
}

请注意,我没有创建任何POJO,只是使用与XPath类似的JSONPath功能读取给定值.您可以使用Jackson做同样的事情:

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.File;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        ObjectMapper mapper = new ObjectMapper();
        JsonNode root = mapper.readTree(jsonFile);
        System.out.println(root.get("a"));
        System.out.println(root.get("b"));
        System.out.println(root.get("d"));
    }
}

我们不需要JSONPath,因为我们需要的值直接在root节点中.如您所见,API看起来几乎一样.我们还可以创建POJO结构:

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.File;
import java.math.BigDecimal;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        ObjectMapper mapper = new ObjectMapper();
        Pojo pojo = mapper.readValue(jsonFile, Pojo.class);
        System.out.println(pojo);
    }
}

@JsonIgnoreProperties(ignoreUnknown = true)
class Pojo {
    private Integer a;
    private BigDecimal b;
    private Integer d;

    // getters, setters
}

即使如此,两个库都允许直接从URL读取JSON有效负载.我建议使用您能找到的最佳方法在另一步骤中下载它.有关更多信息,请阅读本文:从Java中的URL下载文件. /p>

I have a large JSON file (2.5MB) containing about 80000 lines.

It looks like this:

{
  "a": 123,
  "b": 0.26,
  "c": [HUGE irrelevant object],
  "d": 32
}

I only want the integer values stored for keys a, b and d and ignore the rest of the JSON (i.e. ignore whatever is there in the c value).

I cannot modify the original JSON as it is created by a 3rd party service, which I download from its server.

How do I do this without loading the entire file in memory?

I tried using gson library and created the bean like this:

public class MyJsonBean {
  @SerializedName("a")
  @Expose
  public Integer a;

  @SerializedName("b")
  @Expose
  public Double b;

  @SerializedName("d")
  @Expose
  public Integer d;
}

but even then in order to deserialize it using Gson, I need to download + read the whole file in memory first and the pass it as a string to Gson?

File myFile = new File(<FILENAME>);
myFile.createNewFile();

URL url = new URL(<URL>);
OutputStream out = new BufferedOutputStream(new FileOutputStream(myFile));
URLConnection conn = url.openConnection();

HttpURLConnection httpConn = (HttpURLConnection) conn;

InputStream in = conn.getInputStream();
byte[] buffer = new byte[1024];

int numRead;
while ((numRead = in.read(buffer)) != -1) {
  out.write(buffer, 0, numRead);
}

FileInputStream fis = new FileInputStream(myFile);
byte[] data = new byte[(int) myFile.length()];
fis.read(data);
String str = new String(data, "UTF-8");

Gson gson = new Gson();
MyJsonBean response = gson.fromJson(str, MyJsonBean.class);

System.out.println("a: " + response.a + "" + response.b + "" + response.d);

Is there any way to avoid loading the whole file and just get the relevant values that I need?

解决方案

You should definitely check different approaches and libraries. If you are really take care about performance check: Gson, Jackson and JsonPath libraries to do that and choose the fastest one. Definitely you have to load the whole JSON file on local disk, probably TMP folder and parse it after that.

Simple JsonPath solution could look like below:

import com.jayway.jsonpath.DocumentContext;
import com.jayway.jsonpath.JsonPath;

import java.io.File;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        DocumentContext documentContext = JsonPath.parse(jsonFile);
        System.out.println("" + documentContext.read("$.a"));
        System.out.println("" + documentContext.read("$.b"));
        System.out.println("" + documentContext.read("$.d"));
    }
}

Notice, that I do not create any POJO, just read given values using JSONPath feature similarly to XPath. The same you can do with Jackson:

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.File;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        ObjectMapper mapper = new ObjectMapper();
        JsonNode root = mapper.readTree(jsonFile);
        System.out.println(root.get("a"));
        System.out.println(root.get("b"));
        System.out.println(root.get("d"));
    }
}

We do not need JSONPath because values we need are directly in root node. As you can see, API looks almost the same. We can also create POJO structure:

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.File;
import java.math.BigDecimal;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        ObjectMapper mapper = new ObjectMapper();
        Pojo pojo = mapper.readValue(jsonFile, Pojo.class);
        System.out.println(pojo);
    }
}

@JsonIgnoreProperties(ignoreUnknown = true)
class Pojo {
    private Integer a;
    private BigDecimal b;
    private Integer d;

    // getters, setters
}

Even so, both libraries allow to read JSON payload directly from URL I suggest to download it in another step using best approach you can find. For more info, read this article: Download a File From an URL in Java.

这篇关于如何解析巨大的JSON文件而不将其加载到内存中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆